Eric W. Biederman wrote:>>>>> The direction of this patch seems reasonable. The details are broken.>>> The common case for relocatable kernels today is kdump. A situation>>> with very minimal memory. In that situation the kernel needs to run>>> where we put it, modifying the kernel to not run where it gets put>>> is a problem.>> I thought in the kdump case you typically loaded it pretty high? Either>> which way, kdump is always loaded by kexec, so it should just be a>> matter of updating kexec to zero the runtime_start field, no?> > Yes. In practice it doesn't matter. I just don't want to get into a> contest with the kernel about who knows better how to put the kernel> in memory the bootloader or the kernel decompressor.> >> Basically>> this is the bootloader saying "do what I say, dammit." Since the>> existing protocol doesn't have a way to unambiguously communicate one>> direction versus another (see below), it seems like a relatively small>> issue involving only one tool. Suboptimal, yes.> > The existing protocol doesn't have the option of anything else.> > Physical start has always been <= the alignment for x86 and x86_64,> in any real world configuration.

That assumption seems to be the fundamental flaw of the relocationprotocol as written, and rather quite what provoked this whole thing.We really would want to run at above 16 MB for not just 15 MB hole butalso for ZONE_DMA reasons.

> Something goofy may have happened during unification, I thought I had> removed physical start as totally unnecessary from x86_64.> > In the non-kdump case this is interesting. I know of instances where> kexec is burned in firmware. So I am strongly reluctant to make anything> that feels like a true backwards incompatible change.> > Those systems also don't have the stupid 15MB hole either.

OK, kexec in firmware is probably a showstopper... assuming *those*kexec instances care about the exact final location of the code.Otherwise, if all they are doing is loading the kernel and want it totake over the machine, the proposed behavior (realign the kernel to amore optimal point) is pretty much The Right Thing. Could you expand onthis use case? This seems like a key piece of the puzzle.

It's pretty well understood that we can't require changes for the tonsof deployed bootloaders, but at the same time we're stuck in a case withoverloading semantics that have to be disambiguated.

> On the 64bit kernel 2MB really is required. We run at a fixed virtual> address and use 2MB pages. So anything less that 2MB really won't work.> > So I think it would be a bad idea if we had bootloaders ignoring the> alignment.> > With the suggested start address, it probably make sense to only> export our true alignment requirement.

On 32 bits (which is the only case where one megabyte could possiblymatter) we *can* run at 1 MB, and that was the main case I was worryingabout there. On the other hand, even very early Linux just barely ranin 4 MB of RAM, and perhaps an alignment restriction of 4 MB (thenon-PAE case) handles even the smallest configurations? If so we canprobably get away with just disallowing alignment < 2 MB and use yoursolution.

>>> I expect we will still want to update kexec to be able to take>>> advantage of loadtime_size (runtime_size seems like the wrong name).>> Well, it is the amount of memory the kernel needs during runtime (as>> opposed to during loading.) I admit it's not an ideal name, though. On>> the other hand, simply calling it kernel_start and kernel_size seemed>> ambiguous.> > It is the amount of memory we need before a true memory allocator is> initialized. Essentially text+data+bss. How about we call it init_size?> > Perhaps we should have:> init_size> best start (As a 64bit field please)> optimum align (Or we flip it around)

I did think about that (64 bits), but I came to the conclusion that inany case were we're supporting loading over 4 GB we need to be fullyrelocatable anyway -- plus we need a whole bunch of other protocolchanges. This is not in itself a reason not to do it, but the size ofthe initialized header is limited to just over 127 bytes without a muchbigger change (since the size of the structure has to fit inside asingle signed byte at 0x201).