Bare Metal PC Hacking 1 - startup and boot loader

Boot process

So, I want to write something, that will make it easier to make bootable
PC demos and/or games. I want to just bring up the computer and run my
code. No middle-men. No operating systems. Where to start?

I'm targeting the IBM PC compatibles (is this term used any more?) so
let me touch briefly on how a PC starts.

Like pretty much all computers, there's a ROM chip sitting on the bus,
handling a certain part of the memory address space, which just so happens
to include the address from where the processor is designed to start
executing instructions when it's powered up, or after a reset. This ROM
chip holds a program starting at that address, which initializes what
needs initializing, and then looks for something else to run that will
(eventually) bring up the operating system.

So far I might have been talking about any computer system, so let's
make it more specific. In the case of the IBM PC, the program which
resides in ROM is called the BIOS. The processor is an intel x86 or
compatible. The reset vector is at linear address ffff0,
at the top of the original 8086/8088 1MB address space. And the BIOS,
after it finishes initializing the system, looks for a valid boot sector
to load, in a number of storage devices, and in an order usually
configurable through a BIOS configuration menu.

The boot sector is always the very first sector (512-byte block) of a
storage device, and in order for the BIOS to consider it valid, its last 2
bytes must be: 55 aa. When the BIOS finds a valid boot sector, it
proceeds to load it into memory at address 7c00, and execute a
jump to that address to start executing what is presumably code that will
start up the operating system. That code is called a "boot
loader", and it's the first thing we'll need to write, if we're
serious about bare metal hacking on this thing.

Boot loader design and operation

If it's not immediately apparent, let me point out that 510 bytes is
way too small to fit any particularly useful operating system, or in this
case: bootable demo or game. So the piece of code I'll put into the boot
sector must load some more sectors off the original boot storage device
(let's call it disk from now on for brevity), with the rest of my code
and jump to it; hence the name: boot loader.

Interestingly, it's way more complicated than that... There are a
number of obstacles that makes it nigh impossible to fit even a reasonable
boot loader in the boot sector.

The processor starts up in a mode emulating its
aforementioned 16-bit forerunners, called
"real mode", and can
access a mere 1MB of memory, in a horrific segmented memory model. In this
lovecraftian fever dream, all memory accesses are done by combining the
value of a 16bit segment register, with a 16 bit offset, to produce the
actual 20bit address that will go out to the bus. Specifically, the 16bits
of the segment register are shifted 4 bits to the left, and added to the
16 bit offset.

As I don't want to constrain my programs to fit in 1MB, and certainly
don't intend to write my programs in this grotesque memory model, the boot
loader will need to switch the processor to 32bit
protected mode, before
it loads the rest of my program into memory. In fact I intend to load my
whole main program starting at the 1MB mark, and it's impossible to do
that from real mode.

So I'll make what is known as a two-stage boot loader. The first stage
which is loaded by BIOS and must fit in 510 bytes will be a very simple
real-mode program which loads the rest of the boot loader (which can, and
will be much larger) and jumps to it. The second stage then will switch to
32bit protected mode, load the whole main
program without any size limitations, and start it already in a sane 32bit
execution environment with a linear memory model.

The next question that needs to be answered is how to load the second
stage from disk? This one turns out to be simple enough, because BIOS
conveniently provides a number of services, one of which is reading a
bunch of sectors from disks into a memory buffer. Before it gives us
control the BIOS has hooked the interrupt vector 13h, which we can call by
issuing the int instruction (software interrupt). When we do, the
interrupt handler in the BIOS takes control, checks the value of the
ah register where it expects to find the number of the operation
we want, and performs that operation.

The BIOS call number for reading sectors off a disk is 2. That call
expects a number of arguments in other registers. Specfically it expects
the number of sectors to read in al, the cylinder number in the
highest 10 bits of cx, the head in dh, the starting
sector within the track defined by cylinder and head in the lowest 6 bits
of cx, the device number in dl, and finally the
destination pointer in es:bx (segment:offset). The sectors loaded
by a single call must all be within a single track on the disk, and within
the same 64k segment in the destination.

BIOS calls are only callable from real mode.
After I switch to protected mode in the second
stage boot loader, I'll have to jump through
a few more hoops (virtual 8086 mode) to keep
using them to load the main
program. The reason I want to use the BIOS for reading even after the
switch to protected mode, instead of writing
a fully 32bit driver, is that
I want to allow booting from any device supported by the BIOS (floppy,
USB stick, CDROM, etc), and I'd rather not write drivers for USB in
particular, especially not in the boot loader which I intend to keep as
simple as possible.

First stage code and memorable bugs

Here's a video of
the first test. It loads a dummy second stage which just draws
something on screen to help me completely debug the first stage loader,
before moving on to write the actual second stage loader.

I also managed to fit some code to print text and numbers, both to the
screen, and the serial port, which helped a lot in debugging.

One bug I had initially was the failure to load correctly when
booting from USB stick instead of a floppy. The reason, predictably was
due to using hardcoded floppy parameters for the sector linear to CHS
(cylinder/head/sector) address translation, while the BIOS would emulate a
large disk with arbitrary CHS geometry when booting from the USB
stick.

Another bug was that I was setting only half of the palette to the VGA
DAC in the test program, because I used a jno (jump if not overflow),
instead of jnc (jump if not carry) instruction to control the
loop.