I'm writing a physical memory allocator with a bitmap for an x86_64 kernel. I've reserved a part of my virtual address space for this data structure. When allocating a physical page, I go through the data structure and find the first available page and allocate it. What I'm wondering is:

When I create the bitmap structure, I'm thinking of mapping all the virtual addresses, but not allocating physical frames to them (present bit = 0). When these pages are accessed, I will rely on my page fault handler to allocate a physical page to back this virtual address.

I get that there is a circular dependency here - the way I solved that is to set a bit in my page table entry to tell the page fault handler that this is a physical page allocator frame and allocates it separately (specifically, the first bit of every page of my bitmap is used to map the bitmap pages).

Are there any downsides to doing it this way (as opposed to allocating the backing frames for the bitmap up front? Would there be a case for example where I should disable interrupts and request physical frames?

What is the recommended way of allocating memory for the physical frame bitmap?

Current PC memory maps put all the actual stuff at the start, i.e. the first 4GB, and maybe a bit beyond that. It's not like the old original PC memory map, where you had RAM at the start, then a hole, then all the hardware stuff at the end (in the 20-bit days). Now the hardware stuff is intermixed in between the RAM, and the whole thing has a "top of memory" address well below architectural or even CPU limits.

So, what I'd do is this: First, calculate the top of memory address (that being the highest possible RAM address) from the memory map. If you divide that address by 4096 rounding down, you have the maximum number of physical pages in RAM (actually, a little more than that, but we'll correct this later). Now, if you divide this by 8 rounding up, you have the number of bytes needed to store the bitmap. Divide by 4096 rounding up, and you have the number of pages needed for the bitmap. Now you can allocate the bitmap.

What I'd do for this is to run a watermark allocator: You copy all the physical RAM info into another buffer. Then you remove from that copy all the regions currently in use, or that still might contain useful data, like the BDA, EBDA, the kernel image, the stack, and the kernel argument (though I put the latter on the kernel stack). No need to think of the system tables, though, as those are typically excluded from the RAM info, anyway (or you cut them out of the copy by removing all the regions named anything but RAM in the memory map).

Now you can just allocate more memory by returning the first free page in the region list, and incrementing the start of the region by 4096, removing the region from the list if it becomes empty in the process. This way you can allocate all the physical pages needed to set up paging, and to set up a mapping for the bitmap itself. Finally, you can initialize the bitmap now, by clearing all bits, and then setting the bits for all the areas we just talked about.

My laptop has 8GB of RAM, so the top of memory is going to be beyond that line, let's say at 0x2,8000,0000 (assuming the memory holes pile up to 2GB). Then the number of bits needed if 0x28,0000, so the number of bytes needed is 0x5,0000, or 320 KB. That would fit entirely into low memory, if it weren't for the paging structure taking up space as well.

A watermark allocator is entirely sufficient for this task as it is abandoned as soon as the bitmap allocator is functional. If you play your cards right, the watermark allocator will only be in the kernel loader, and thus freed as soon as the kernel itself scrubs the page table of all remains of the loader.

I like the attempt to use virtual memory management tricks in kernel-space (I wish more people would try); but unfortunately for this specific case...

zaptor wrote:

Are there any downsides to doing it this way (as opposed to allocating the backing frames for the bitmap up front?

When there's a lot of free RAM it doesn't matter if you use RAM for the bitmap because that RAM would have been free/wasted/unused anyway, and when there's very little free RAM the bitmap must have already been created. Essentially, as physical memory goes from "lots of free RAM" to "very little free RAM" (e.g. during boot and soon after when apps start getting used) there isn't an instance in time where dynamically extending the bitmap helps physical memory usage.

It might allow you to boot a little faster (by postponing the creating of the bitmap) but you'd be saving a small amount of time during boot (where you can do "wholesale bulk initialisation" efficiently) and it'll be costing you a lot more time later in boot (where you're paying for the additional costs of page faults and doing "one page of bitmap at a time initialisation"); so that's not really a great compromise either.

The other thing is that it'd be a more complex (higher risk of bugs, higher cost of maintenance); and this is where it starts getting scary (e.g. imagine there's multiple CPUs and you need a lock protecting the bitmap because a "lock-free" approach is too hard when you're dynamically extending the bitmap).

zaptor wrote:

What is the recommended way of allocating memory for the physical frame bitmap?

The recommended way is to not use a bitmap, at least not for all of physical memory.

Note that the physical memory map is "multiple discontinuous blobs of usable RAM" - e.g. for a simple system (no NUMA or anything) you might have ~3 GiB (with various pieces taken out of the first 1 MiB for legacy stuff) then a large 1 GiB gap (for PCI hole, HPET and APIC, system ROM) then another GiB of usable RAM (with a few more small pieces taken out for ACPI areas, etc).

For a single bitmap; for optimised searching (e.g. keeping track of "lowest potentially free page" to reduce the area to search, combined with an outer loop used to check 64 bits at once) it still ends up being expensive (e.g. think of all the cache misses as you plow through large areas of "will never be usable RAM").

If you do use a single bitmap; I'd strongly recommend generating the whole bitmap during boot; and I'd consider using the same "single physical page full of zeros" for any of the "will never be usable RAM" areas that are large enough (to reduce physical memory used by the bitmap a little and reduce some of the cache misses and cache pollution as you search through large gaps).

Cheers,

Brendan

_________________For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.

When there's a lot of free RAM it doesn't matter if you use RAM for the bitmap because that RAM would have been free/wasted/unused anyway, and when there's very little free RAM the bitmap must have already been created. Essentially, as physical memory goes from "lots of free RAM" to "very little free RAM" (e.g. during boot and soon after when apps start getting used) there isn't an instance in time where dynamically extending the bitmap helps physical memory usage.

If you mark pages in bitmap as "0 = already allocated/not available and 1 = can be allocated" you could check when the page becomes "all zeros", release it and replace it with copy-on-write "single physical page full of zeros" page. This way you could squeeze out a few extra pages in "very little free RAM" conditions. But I doubt that for say 4 GiB of in-use RAM, another ~100 KiB will make much difference.

Since "fixed" bitmap uses ~ 0.003% of the RAM it represents, I don't think it is worth to add that much complexity here.

As for search optimization, this feels like a nice opportunity to advertise an idea from way back That way you do not have to "plow through large areas of \"will never be usable RAM\"" - you land in the area of usable RAM quickly.

Same as my predecessor I'd like to give you an alternative algorithm of mine . If you don't need multiple physical pages that are continuous, then it might be something for you. It's an O(1) allocation and free with about 50 lines. The memory footprint is between 1-2MB per 1GB (~0.1-0.2%). Some modifications may be needed if you want to remove pages that are reserved (e.g. used for MMIO) but it shouldn't be more than a few lines, while surely less work then your current solution. You can find the file in my git repository https://gitlab.com/abich6990/O1_Page_Allocator. Also why I'm fairly sure it works, the algorithm has only been tested in a simulation. I had not yet time to test it in a real operating system, because coming up with the theory and writing the code happened just recently . Well hopefully this helps and if you try it then please report your results back .

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum