At one point, I attempted to write my own kernel just like this and spent the entire time mucking about in real mode trying to write my own bootloader. Had I known getting it to boot in grub was so easy, I would have gone with that instead.

If anyone wants to try this, I highly recommend doing it inside a virtual machine. On Linux, QEMU works quite well. There appear to be simplified instructions for getting GRUB to work with it here.

There’s a slight problem in this tutorial in that it assumes ESP (the stack pointer) will be defined by the boot loader to point to an appropriate location for the stack. However, the Multiboot standard states that ESP is undefined, and that the OS should set up its own stack as soon as it needs one (here the CALL instruction uses the stack, and the compiled C code may well too).

An easy way to solve this is to reserve some bytes in the .bss section of the executable for the stack by adding a new section in the assembly file:

[section .bss align=16]
resb 8192
stack_end:

Then before you make use of the stack (between cli and call kmain would be appropriate in this case), you need to set the stack pointer: