Details

This change moves the ppc64 direct map base address up to high memory and fixes a -- suprisingly large -- amount of fallout from the change. We want to move it for three reasons:

It makes NULL and related addresses invalid pointers in the kernel, giving us a fighting chance of catching NULL-pointer bugs.

It needs to be here for POWER9 Radix translation to work.

If 64-bit Book-E grows a direct map, it also needs to be in high memory.

The new base address is chosen as the base of the fourth radix quadrant (the minimum kernel address in this translation mode) and because all supported CPUs ignore at least the first two bits of addresses in real mode, allowing direct-map addresses to be used in real-mode handlers. This is required by Linux and is part of the architecture standard starting in POWER ISA 3, so can be relied upon.

The kernel continues to execute from addresses below the (new) direct map with a 1:1 mapping. This is a consequence of the early bootstrap code, which will be addressed in a follow-up commit on systems that can handle it (all but Apple hardware when running with OF rather than FDT).

Because the amount of fallout from moving this was surprisingly large, testing on a diverse set of systems would be appreciated. So far, this is tested only on Apple hardware (working well), PS3 (likewise), and pSeries and PowerNV (in QEMU).

But this is still very confusing, since there just two places in the code we use mfdsisr, and it is always with %r31.
I am wondering if something caused the code to leave the kernel code, and now I am debugging somewhere else (OpenFirmware?).

That sounds plausible. I think there is a possible failure mode here involving firmware calls. When we call into OF on pSeries, we disable interrupts and restore OF's trap handlers. But the MMU is still on, so we could, in principle, cross a segment boundary and get a DSE or ISE, which could be more common now. What happens if you disable trap handler save/restore in ofw_save_trap_vec()/ofw_restore_trap_vec()?

Another thing to try with the Open Firmware crash is to set 'usefdt=1' in the loader. For me, either of these resolve the crash. Could you verify this on your hardware?

What I think is going on is that we are hitting a failure mode that was always present with the trap/save restore code added in r258504 by andreast. If the kernel -- any thread -- takes a trap after OF's trap vectors are restored, Bad Things happen. We disable interrupts, which helps, but there is still the following corner case, which has always been with us but seems more likely now: The MMU is still on when we restore the OF vectors. If the thread takes an segment exception. the kernel blows up. That could happen for a number of reasons: we could touch KVA we haven't used in a while while setting up the OF call, for instance. More perniciously, the hypervisor is free to invalidate arbitrary SLB entries at will, so any thread could take an ISE/DSE at any time for no reason. This includes the other CPUs, while have their MMUs on for the duration.

There are three possible workarounds that come to mind:

We turn off the trap save/restore code. This will break POWER5 systems.

We move all threads into MMU-off mode in the rendezvous, spinning on some small block of assembler that uses no stack.

We make usefdt=1 the default on all non-Apple systems (it doesn't seem to work right on Apple hardware, for unclear reasons, but seems to be 100% reliable on IBM-ish systems).

I think my preferred approach would be 3, then, post-commit, try to widen the scope of usefdt to include Apple hardware and start retiring the giant pile of hacks that is ofwcall*. Opinions?

Thanks for being invited to the review.
Well, I'm looking forward to a solution. I do not care in a first priority about POWER5+, it is broken anyway, genius iflib. But I can at least test the bring-up.
A more important priority is the G5 support. I'd need one and I'm willing to help.
For POWER8/9, others have to jump in.

With the current patch, I was able to test it properly on my pseries VM, but I didn't use usefdt=1, mainly because I didn't know where to use it. Should I need to recompile the loader to use it, or, is it a parameter somewhere?

Hi Nathan,
With the current patch, I was able to test it properly on my pseries VM, but I didn't use usefdt=1, mainly because I didn't know where to use it. Should I need to recompile the loader to use it, or, is it a parameter somewhere?

And it was working fine?

usefdt is just a command-line argument to the loader. At the loader prompt, you can do:
OK set usefdt=1
OK boot

Hi Nathan,
With the current patch, I was able to test it properly on my pseries VM, but I didn't use usefdt=1, mainly because I didn't know where to use it. Should I need to recompile the loader to use it, or, is it a parameter somewhere?

And it was working fine?

If I do not set the usefdt=1, it works fine. If I set it, it break in the early start.

Hi Nathan,
With the current patch, I was able to test it properly on my pseries VM, but I didn't use usefdt=1, mainly because I didn't know where to use it. Should I need to recompile the loader to use it, or, is it a parameter somewhere?

And it was working fine?

If I do not set the usefdt=1, it works fine. If I set it, it break in the early start.

0xe000000000008700: at 0xe0000000000087fc
0xe000000000008790: at .uma_dbg_getslab+0x38
0xe000000000008820: at .uma_dbg_alloc+0x4c
0xe0000000000088b0: at .uma_zalloc_arg+0x1a8
0xe000000000008980: at .malloc_init+0x70
0xe000000000008a10: at .mi_startup+0x11c
0xe000000000008aa0: at btext+0xb4

This is fascinating. Could you verify that this happens without this patch (I assume it is unrelated?) and give some more details on your setup that gave this error. I would like to try to get to the bottom of it.

This is fascinating. Could you verify that this happens without this patch (I assume it is unrelated?) and give some more details on your setup that gave this error. I would like to try to get to the bottom of it.

I tested it on a kernel without this patchset and the problem seems to happen, so, it is not related to this change.

Boots fine on Book-E. After this lands, and I finish my other work, I'll give 64-bit Book-E a DMAP, too, which will reduce the vmparam.h diff even further. Actually, on closer inspection, we should be able to just move the kernel to 0xe00.... already anyway.