ENABLING MULTI-PROCESSORS IN MY HOBBY OS

2015-10-19

I recently added multi-processor support in my homebrew OS. Here are the technical details. BTW:
Chapter 8 and 10 of the Intel Manual 3 are probably your best resource.

When the system starts, all but one CPU is halted. We must signal the other CPUs to start.
I won't go into the details of how to bootstrap the processor, that step is easy: just
go in protected mode then setup paging and jump to long mode. This is very well covered in the Intel
manuals.

Basically, this is how we switch to protected mode

// Before going any further, you must enable the A-20 line. Not covered in this example
push %cs /* remember, cs is 07C0*/
pop %ds
mov $GDTINFO,%eax
lgdtl (%eax)
mov %cr0,%eax
or $1,%al
mov %eax,%cr0 /* protected mode */
mov $0x08,%bx
// far jump to clear cache
ljmpl $0x10,$PROTECTEDMODE_ENTRY_POINT
GDTINFO:
// GDT INFO
.WORD 0x20
.LONG . + 0x7C04 /*that will be the address of the begining of GDT table*/
// GDT
.LONG 00
.LONG 00
// GDT entry 1. Data segment descriptor used during unreal mode
.BYTE 0xFF
.BYTE 0xFF
.BYTE 0x00
.BYTE 0x00
.BYTE 0x00
.BYTE 0b10010010
.BYTE 0b11001111
.BYTE 0x00
// GDT entry 2. Code segment used during protected mode code execution
.BYTE 0xFF
.BYTE 0xFF
.BYTE 0x00
.BYTE 0x00
.BYTE 0x00
.BYTE 0b10011010
.BYTE 0b11001111
.BYTE 0x00
// GDT entry 3. 64bit Code segment used for jumping to 64bit mode.
// This is just used to turn on 64bit mode. Segmentation will not be used anymore after 64bit code runs.
// We will jump into that segment and it will enable 64bit. But limit and permissions are ignored,
// the CPU will only check for bit D and L in this case because when we will jump in this, we will
// already be in long mode, but in compatibility sub-mode. This means that while in long mode, segments are ignored.
// but not entiorely. Long mode will check for D and L bits when jumping in another segment and will change
// submodes accordingly. So in long mode, segments have a different purpose: to change sub-modes
.BYTE 0xFF
.BYTE 0xFF
.BYTE 0x00
.BYTE 0x00
.BYTE 0x00
.BYTE 0b10011010
.BYTE 0b10101111 // bit 6 (D) must be 0, and bit 5 (L, was reserved before) must be 1
.BYTE 0x00

Detecting the number of CPUs

The first thing to do is to detect the number of CPUs present. This can be done
by looking for the "MP floating pointer" structure. It is located somewhere in in the BIOS
address space and we must find it. I won't go into the details of the structure since it is
very well documented everywhere. The MP structure contains information about the CPUs and IO APIC
on the system. This structure is filled in by the BIOS at boot time. The structure can be at many
places hence why we must search for it in memory. It starts with "_MP_" and contains a checksum, so
by scanning the memory, you will find it. The important thing to know is that you do the following:

Find the structure in memory. According to the specs, it can be in a couple of different places.

Detect number of CPUs and Local APIC address of CPUs

Detect IO APIC address.

For more details on how to find the structure and its format, make a search for "Intel Multi-Processor Specification".

When wandering in the SMP world, you must forget about using the PIC (Programmable Interrupt Controller)
The PIC is an old obsolete device anyway. The new way now is the use the APIC. So we won't be using
the PIC anymore. There is a notion of a local APIC and the IO APIC. The local APIC is an APIC
that is present on each CPU. The local APICs can be use to trigger interrupts from one CPU to
another, as a way of communication. When the system starts, all but one CPU is halted.
We must signal the other CPUs to start. The PIC could not allow us to do that, hence
why we must use the APIC. The local APIC will allow us to trigger an interrupt on the other CPUs
to get them out of their halted state.

We must then setup the local APIC for the current CPU. Each CPU have their own APIC and their
APIC is mapped at the same address for each CPU. The local APIC address is 0xFEE00000.
So when CPU0 read/writes at 0xFEE00000 it is not the same as if CPU1 read/write at 0xFEE00000
since the address maps to each CPU's own APIC. This is nice because it means
you dont need to do something like "What CPU am I? number x? ok, then use address xyz then."
Each CPU only need to write at the same address and they will be guaranteed to write to their
own APIC. It's all transparent so you don't need to worry about it. The address of the IO APIC
maps to the same IO APIC for all CPUs though. But that's also good because all CPUs
want to use the same IO APIC anyway.

The SMP_TRAMPOLINE constant is the address of where I want the APs to jump to when starting. This address must be aligned
on a 4k boundary because we the SIPI message takes the page number as a parameter. Hence why I SHR the address by 12 (div by 4096).
And since the APs will start in 16bit mode, the address must reside under the 1meg barrier. STARTEDCPUS is a 64bit bitfield
that represents the CPUs. Each bit get set by the APs (cpuX sets bit X).

Application processors trampoline code

I decided to put the Application Processor's trampoline code in the bootloader (I've got 512bytes of room, that
should be enough). The bootloader is a good decision beacause it is below the 1meg mark, the source file is
compiled as 16bit code and all the initialisation is done there anyway. But when an AP starts, it will
be given a start address aligned on a 4k page boundary and the bootloader is at 0x7C00. So the bootloader
will copy a "jmp" at 0x1000 to jump to the bootloader AP init function. So the order of execution is:

AP receives SIPI with vector 0x01

AP jumps to 0x1000

Code at 0x1000 will make AP jump to 0x7C0:

AP will switch protected mode and jump to KernelMain

KernelMain will check in MSR[0x1B] if this is an AP or the BST. if BST, then jump to normal initialisation