How the Atari ST almost had Real Unix

It was 4:00 in the afternoon, and time to walk to the dump, out past the Sunnyvale sewage plant, and talk about hard problems.

I’ve had some of my best ideas while out walking. I’m 6’4″ with long legs, and I walk fast. I find when my legs are occupied I can let things sift through my head and sort of bounce around until they fall into place. I have to leave the cell phone and music player and other distractions behind or I won’t get work done. Without distractions, I just get out there and let the ground travel past my feet, and things get solved. At least it’s good exercise.

That, or just go out for a walk with cow-orkers and enjoy the day, and bullshit about stuff.

The latest problem I was working out was how to run Unix on the Atari ST. The Tramiels had somehow wrangled a license for AT&T’s SVR-something-or-other version of Unix (might have been SVR3, but this was in the bad old days when AT&T was actively fucking up Unix, and it could have been just about any version, including SVR666). The license was for a mind boggling, nay, jaw-dropping ten bucks a seat. The problem was that the ST didn’t have any kind of memory management hardware, just a raw CPU flinging real addresses at naked DRAM, and the machine’s cheap-ass vanilla 68000 was incapable of recovering from a fault unless you cheated.

[What’s that about Linux? Dear child, Linus was probably not out of whatever they use for high school in Finland. All we had in the market was 4.2bsd running on Vaxen, 68K-based Suns with a ton of hardware to work around the 68000 limitation re faulting, and a whole running field of wannabes that would sink without a trace in five years. Oh, and some screwed up AT&T workstations with monochrome graphics and UIs that curdled your eyeballs and left you wishing AT&T had simply stuck with making phones.]

The hardware folks were convinced that grafting an MMU into the ST was impossible; in theory you could still run something like Unix, but with no memory protection and no way to easily grow and shrink a process’s address space a straight-forward port of Unix would be glacial and prone to crashing really badly. The hardware guys were mostly right; the 68000 wasn’t capable of handling a page fault (it didn’t save enough information on its exception frames to restart all cases of a faulted instruction). Motorola didn’t offer an MMU chip anyway (the 68020 didn’t exist yet, and the sticker shock of its optional external MMU meant that only Apple folks could afford it, and it was still optional on most Macs for several years). Furthermore, the memory system of the ST wouldn’t tolerate the delays that a traditional MMU would incur; the ST’s DRAMs were being thrashed ten or fifteeen nanoseconds under spec (“You have to understand,” said our hardware guys, “DRAMs are really analog devices,” and I’m sure a DRAM designer somewhere felt cold and shivery all of a sudden, and didn’t know why).

To run Unix effectively we needed some hardware that was very fast, that was simple enough to put into a minor spin of the ST’s memory controller with little project risk, and that would still provide some kind of memory relocation and protection. The ability to have separate address spaces to isolate processes would be good, too.

“If you can come up with something that takes about a gate delay, I’ll put it in,” said John, the memory controller guy. He seemed dubious, but willing to listen.

I went for a bunch of walks.

– – – –

In the early 80s, eastern Sunnyvale bordered southern San Fransisco Bay with a landfill hill (a large, long mound maybe a hundred feet high), and a sewage treatment plant just beyond. Beyond these were settling ponds for the sewage, separated by a large number of wandering dikes upon which were set miles upon miles of paths for walking. I never exhausted the paths. It was easy to get your heart pumping and your legs swinging and let your head fly off into some tough technical nut. I never really noticed any smell; maybe once or twice. The winter rains washed the stink out of the air.

There were birds out there by the thousands, and any number of rodents. I saw an enormous heron once and realized why my parents had been so excited to see them nest in a marsh we’d lived near in Ohio.

We could also get a good view of planes at Moffet Field. Occasionally a U2 would take off, shaking the ground slightly as it roared into the stratosphere to look (we were told) for pot fields in northern California, saving the world for democracy.

Then the path would loop back, and I’d bounce some ideas off of people. Eventually we got it.

– – – –

The MMUs I knew about did page table walks of a multi-level tree; those multiple indirections implied complex, stateful and slow machinery. There was no room in the ST’s memory controller for the caches required to make a table-based system perform reasonably, even if the gate count of table-lookup hardware had been possible. The ST was no VAX. We had to pay dearly for chip area, schedules were tight, and DRAM timing was even tighter. Nobody wanted to pay for a feature they’d never use.

Non-MMU-based systems used base-and-bounds; a favorite technique in mainframes and minis from the 60s and 70s. We could get protection by checking user accesses against limit registers, a pretty cheap operation, but that wouldn’t get you relocation. To do that you had to muck with the address bits, and do an addition.

The problem was, there wasn’t time to do an addition with the necessary carry-propagation on every single address issue, not to mention the gate count.

So how does a typical Unix process grow? The text and data are at the bottom of the address space and don’t move; the bss follows those, and grows up via the “brk.” The stack grows down. That’s it. Very simple, very hippy 70s.

So imagine something really minimal, like replacing a bunch of address lines with a chosen constant value for user-mode accesses. Leave everything untouched for supervisor accesses. That’s it, that’s your total relocation. It’s really simple to implement in hardware, just a latch and some muxes to choose which address lines go through and which get replaced.

For address space growth you have another register, a mask or a bit count, that checks to see if some number of the issued upper address bits are either all zero or all one. You start the stack at 0xfffffffe and grow down. You start the bss low and grow up. A variable number of bits in the middle of each address are simply substituted. If the upper N bits aren’t all 0000…00 or 11111…11 then you generate a fault.

Now you have a system that both relocates addresses and handles growth in two directions in powers of two. You use throwaway instructions to do stack growth probing (dud sequences that don’t need to be restarted if they fault), and that needs a little compiler work, but it’s not too bad. Processes are tiled in memory at power-of-two addresses, so there’s more physical copying going on than you probably like when stuff grows, but again, it’s not too bad. Welcome to the world of doesn’t-totally-suck, and probably runs rings around a PDP-11 despite its limitations. AT&T SVR-N didn’t have paging anyway (like I said, they should have stuck with phones).

– – – –

John Horton, the memory chip guy, actually did this hardware for the Mega ST; I don’t know if it’s documented, or if he had to sneak it in or not. I do know that it was not used for Unix in my time at Atari; the deal with AT&T expired before the hardware existed, and frankly, supporting Unix probably would have been a massive effort, and one that the Atari software group would have been unable to adequately support. I vaguely recall some Unix-a-like environments for the ST, but I pretty much lost interest in the ST after I left Atari in 1987.

I recall talking about this scheme with John Gilmore, who took a kind interest and asked some good probing questions. We had some great conversations at some otherwise strained hacker dinners in Berkeley. (I’ll talk about South Bay versus North Bay geeks some other time…).

The thing I never quite understood was how motorola let the 68K out of the door with such a broken page fault implementation.

I have several other hates on the 68K family, I designed a networked chip for a mixing desk (nominally one per channel of audio) that put 16 bit values over the wire (lsb first as per usual for HDLC) which proved to be a complete nightmare to work with on a 68K based controller that we were using (68302 IIRC) as there is no equivalent to BSWAP/ xchg ah,al. Annoying as I could have fixed in hardware had I known.

Regarding the sewage treatment, dump, walking paths, and great view at Borreagas and Caribbean – its still there, and a lot of people use it for thinking still. I live right around the corner from it (1.6 miles), and take my kids there and tell them about the pioneers of modern computing who used to roam the area. Its nice to see this article – I do miss the hardware hacking days. I personally never worked for Atari, though it was something of a dream of mine as a child to do so. I did build the ring detectors for the non-autoanswer modems, and hack my floppy drive for higher density, and build WESAT interfaces and photo scanners from my dot matrix printer…

Interesting story… Indeed sometimes things just get over-engineered 🙂
It seems some PowerPC chips had mostly just some TLBs as sole mmu, which just about more complex really. And I heard antique versions of WinCE were limited to 16 tasks, probably using a similar memory banking mechanisms.

As for Atari, as mentioned MiNT is as close to UNIX as possible while maintaining TOS compatibility, and it seems to be quite nice, and can even use the mmu optionally on 68030. I recall seeing articles about Minix on the french ST Mag. (I also recall funky stories by Dave Small in there, like the design of the SST 030 or “Hacker’s memory”)

Now, if you are interested in giving a hand, I’d welcome help on my 68k Haiku port to Falcon & friends, mostly started out of frustration of not having had an Atari myself back then…
I’m currently fixing used-to-be-working-but-broken-due-to-x86-changes mmu code in the kernel. See the boot menu and splash 🙂

“The thing I never quite understood was how motorola let the 68K out of the door with such a broken page fault implementation.”

Because it was largely competing with things that didn’t have any page fault implementation. The 68000 wasn’t designed to play in the same market as the VAX. It was designed to be the 6502 of it’s era.

I’ll bet if they had taken the time and gate count to have real page faults they would have missed the market window and the Macintosh, Atari ST, and Amiga would have had some other CPU. As would the Apollo Domain, Sun-1/Sun-2 and Sun-3, as well as a bunch of other early 80’s “microcomputer Unixish Systems”. Moto would have made way less money in the 80’s. In the long term though they would be in the same place they are now: “not in the CPU business”.

“All we had in the market was 4.2bsd running on Vaxen, 68K-based Suns with a ton of hardware to work around the 68000 limitation re faulting”

If I recall correctly the Atari ST came to market while the Sun-2 was current. The Sun-2’s faulting workaround was not in hardware. The OS checked to see if the faulting instruction was an XOR, if so and if the target address was close to the top of the stack the stack would be expanded. If it was any thing else you got a SEGV. The Sun-2 didn’t do paging, it did whole process swaps.

I heard the Apollo’s of the era used 2 68000’s that ran at a single clock cycle offset and the “leader” handled the page fault, and then let the “trailer” finish the access, and did some sync up dance and maybe swapped positions. Very costly. The Apollo did have a pager.

The Sun-2 might have had an adder though. I’m not sure.

The Sun-3 had a real MMU, but it also had a 680×0 that supported one (I can’t remember if it was a 68010 or a 68020 though).

(I couldn’t afford a Sun at the time, my dad had one at work, or more accurately his group had one…I had an Atari ST. I wanted an Amiga. In the long run the effect was the same, I learned C and 68k assembly, dreamed of writing video games, eventually got a job as a CoinOp game programmer, and found out things in real life are seldom as fun as daydreams, and have been programming other things ever since I left that job)

No. The Sun-1 was a 68000 with a fairly simple Sun (well…really UC Berkeley) designed swapping MMU (I forget the details). The Sun-2 had a 68010 and a Sun custom PMMU. The Sun-3 had a 68020 again with a proprietary PMMU for most of it’s run and at the very tail end (as they we’re preparing to move to SPARC) there were a few models with the 68030 (referred to as the Sun-3x line).

Such great memories. People I work with have no idea how things were ‘back in the day’

Relevant to your story, I always thought the 68k was an amazing jump. It is only in hindsight that people label this chip a 16/32 bit chip. IMHO, if you have 32 bit regs, you are a 32 bit chip. This was quite an amazing jump in an era that was still firmly 8 bit.

Anyways, if I recall, you could put a 68010 in an Amiga and this would fix most of the issues with the older 68k.

Also, a lot of early computers had bank switching or memory mappers. They didn’t have the protection bits, but they did allow you to remap the address space. It would have been trivial to add protection bits on
some of these systems. Yet, the systems were so small, it wasn’t a consideration.

There was a unix like OS on it that actually used the MMU. Pages were 8k and it could only have 8 of them given the 6809’s 64kb address space. You could have several processes running on such a system and it worked (an editor and a compiler, etc.).

The difference between what we have today and then is astounding. Yet, in some dimensions, not much has changed.

I recall reading about an early-90s effort to bring Unix System V to the Atari TT030 that was near completion, but never reached market. Did a little Google search and found a couple interesting links:

> For address space growth you have another register, a mask or a bit count, that checks to see if some number of the issued upper address bits are either all zero or all one. You start the stack at 0xfffffffe and grow down. You start the bss low and grow up. A variable number of bits in the middle of each address are simply substituted. If the upper N bits aren’t all 0000…00 or 11111…11 then you generate a fault.

This is a really clever scheme.

Suppose I have a 64K process, occupying 0xABCD0000 to 0xABCDFFFF

The process thinks it’s using address 0 and up for the code and data and 0xFFFFFFFE and down for the stack.

In between there’s a 4G-64K gap. So the 68K generates and address and you can replace the top 16 bits of it with 0xABCD. If it tries to write outside its allocation the rule about all ones or all zeros in high order bits is broken.

And you can grow the allocation. In fact the power of two nature of it lends itself well to a buddy list so it’s not going to fragment.

Just a quick note for anyone who bounces along to this thread in the future. I sent a quick email over to one of the 68K design team (Nick Tredennick) and his reply (cut down a bit here) was

So, the answers to your questions are. It didn’t come up until the design was almost complete; virtual memory wasn’t in the specification; and it wasn’t obvious (at least to me) at the time because there were very few microprocessors in computer implementations.

The 68010 and later had a recoverable bus-error (page fault). The Amigas had daughterboards you could put in with the 68020 and later processors which had full capabilities (and the 68881 floating point unit).

Masscomp used twin 68000s and would freeze one if it would fault and use the second to fix things to run their version of unix.

The Mac emulator (Magic Sac by Dave Small and friends) used some bizarre but effective techniques to recover from page faults. There wasn’t enough info on the stack by itself to recover, but you could go through the place it happened and recover (null pointer derefs were a particular problem).

“In 1986, David M. Stanhope and Skip Tavakkolian at Computer Tools International ported Idris to the Atari ST and developed its ROM boot cartridge. This work also included a port of X to Idris. Computer Tools and Whitesmiths offered it to Atari as a replacement for Atari TOS, but eventually marketed it directly to ST enthusiasts.”

Idris was a clean-room rewrite of UNIX v6, specially crafted to not require an MMU but could use one if present – in non-MMU systems it did full-process-swaps and relocation-only-on-process-startup (ie: programs could be loaded anywhere in memory, relocated at load-time by the OS, but once loaded were never subsequently moved, except to the swap-device when needed). Idris was ported to PDP-11, IBM 370, a multitude of 68K systems (including a version that ran as a Finder App on Macintosh – a very early VM), transputer, VAX, and even an 8080 system using bank-switched memory!

In 1989/1990 I worked in the team that ported Idris to the Parsytec SN1000 multi-transputer system, which had considerably more unconventional CPU-isms than the 68K, and at a time when multiple CPUs in a single system image was considered wierd and dangerously new-fangled 🙂

Translation from the wiki article:
…
“Prototypes of the TT/X were shown on the ATARI-Messe in Düsseldorf, Germany.
It was an Unix-System-V-Version-4 compatible system, one of the first SVR4 systems anyway.
This version was delivered on tape or hard disc to developers in the end of 1990, but never to customers.”
…

They gave me, at the time, opportunity to port software like gcc, bash, xnews, mosaic, gopher, gnuplot, TeX/LaTeX to the TT. In fact, I used these systems to process all the data for my thesis, doing lots of model calculations, and finally generating the postscript for publishing the thesis.

These machines are lost, and I still regret that I wasn’t able to save them.

I miss those days when you had to count t-states because at 4Mhz every clock cycle mattered. When we cared about optimizing code because the effect for the end user was perceptible. The clever creativity was such a huge attraction to me as an Engineer. Now it seems people just shovel one framework on top of some open source they grab off of github, never looking under the hood to see if the code decent or not. Guilty party here. Who cares? You’ve clock and memory to burn. And SHIP IT! At least mobile is re-awakening a type of old school ingenuity to a certain extent, because you have to conserve battery, RAM and mobile data utilization. Even though Apple is more or less doubling their hardware capabilities every year or two on mobile, conserving battery will always be a goal.

Thanks for sharing this story, my kindred brother. I waited with bated breath for the ST with Unix. That was going to be awesome! Or any micro with Unix. Frustrated with CP/M and DOS, the guys I worked with even contemplated going in together to buy a used Vax 11/780 when people were dumping them via magazine ads for about $10K. We would have each been ST customers.

I was porting Unix for a living from ’84 onwards .. You could do base and bounds and swapped (but not contiguously loaded) systems on the 68000 – bss doesn’t need to auto extend, and stack extension could be done with a test probe instruction in subroutine entrance code because the kernel can recognise that instruction and do the right thing it didn’t need to be restarted in the same way that a true page fault does.

Motorola did make an MMU for the 68000/68010, it sucked but could be made to work, lots of people built their own clones of sun’s – basically a pile of sram, sort of really big pages in smallish (1mb) address spaces.

At&ts SVr2 was a true paging kernel the previous V6/7, system III, and plain V were swapping kernels

I’ve been asking in a German Atari forum, if there’s been an OS which used this “pseudo MMU”, and I’ve been told that it wasn’t implemented in the MegaST, but in the EST prototype (which never made it to the market, but morphed into the TT030).