*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-01-31 1:48 ` Dave Young@ 2019-01-31 10:00 ` Bhupesh Sharma
2019-01-31 14:03 ` Dave Anderson
2019-02-04 16:04 ` Kazuhito Hagio2 siblings, 0 replies; 29+ messages in thread
From: Bhupesh Sharma @ 2019-01-31 10:00 UTC (permalink / raw)
To: Dave Young
Cc: Mark Rutland, Kazuhito Hagio, lijiang, bhe, ard.biesheuvel,
catalin.marinas, kexec, Will Deacon, AKASHI Takahiro,
James Morse, Borislav Petkov, anderson, bhupesh.linux,
linux-arm-kernel
On 01/31/2019 07:18 AM, Dave Young wrote:
> + more people
> On 01/30/19 at 05:53pm, Bhupesh Sharma wrote:
>> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which
>> supports these extensions can support upto 52-bit virtual and 52-bit
>> physical addresses respectively.
>>
>> Since at the moment we enable the support of these extensions via CONFIG
>> flags, e.g.
>> - LPA via CONFIG_ARM64_PA_BITS_52
>>
>> there are no clear mechanisms in user-space right now to
>> deteremine these CONFIG flag values and also determine the PARange and
>> VARange address values.
>>
>> User-space tools like 'makedumpfile' and 'crash-utility' can instead
>> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine
>> the maximum virtual address and physical address (respectively)
>> supported by underlying kernel.
>>
>> A reference 'makedumpfile' implementation which uses this approach to
>> determining the maximum physical address is available in [0].
>>
>> [0]. https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490
>
> I'm not objecting the patch, just want to make sure to make clear about
> things and make sure these issues are aware by people, and leave arm
> people to review the arm bits.
>
> 1. MAX_PHYSMEM_BITS
> As we previously found, back to 2014 makedumpfile took a patch to read the
> value from vmcore but the kernel patch was not accepted.
> So we should first make clear if this is really needed, why other arches
> do not need this in makedumpfile.
I explained this a bit in my reply to Suzuki's and James's review
comments yesterday, but let me summarize the same again for better clarity:
Let's take the example of x86. We have CONFIG_X86_5LEVEL config flag to
indicate 5 level page-table support. We export the same in vmcoreinfo
for x86_64 using:
void arch_crash_save_vmcoreinfo(void)
{
<.. snip..>
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
}
Also a simple grep in makedumpfile and crash for the same indicates that
the user-space code determines the MAX_PHYSMEM_BITS value using the
'pgtable_l5_enabled' value available in vmcoreinfo:
Example from makedumpfile:
-------------------------
int
get_versiondep_info_x86_64(void)
{
/*
* On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40.
*/
if (info->kernel_version < KERNEL_VERSION(2, 6, 26))
info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG;
else if (info->kernel_version < KERNEL_VERSION(2, 6, 31))
info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26;
else if(check_5level_paging())
info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL;
else
info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31;
...
}
As we can see above, we use several if-else cases to determine the
'MAX_PHYSMEM_BITS', one of which is setting it to 52-bit, if the
'pgtable_l5_enabled' value is available and TRUE in vmcoreinfo.
So, we determine 'MAX_PHYSMEM_BITS' value in makedumpfile via a
vmcoreinfo export'ed variable (rather than using named
'MAX_PHYSMEM_BITS' its 'pgtable_l5_enabled' that is being used).
Since for arm64, we don't have a single CONFIG flag for 52-bit addresses
spaces (kernel VA, user-space VA and PA), so its better to export the
respective CONFIG flags in the vmcoreinfo directly.
> If we really need it then should it be arm64 only?
See above, since archs like x86 use a single flag: CONFIG_X86_5LEVEL,
whereas arm64 can use the combination of following flags to indicate
combinations of various address spaces:
- 48-bit kernel VA + 48-bit user-space VA + 52-bit PA
- 48-bit kernel VA + 52-bit user-space VA + 52-bit PA
- 52-bit kernel VA + 52-bit user-space VA + 52-bit PA
CONFIG_ARM64_64K_PAGES
CONFIG_ARM64_USER_VA_BITS_52
CONFIG_ARM64_VA_BITS
CONFIG_ARM64_PA_BITS_52
CONFIG_ARM64_PA_BITS
CONFIG_EXPERT, and
CONFIG_ARM64_FORCE_52BIT
so probably its not correct to compare the two cases one-to-one (its
more like an apple and an orange comparison).
> If it is arm64 only then the makedumpfile code should read this number
> only for arm64.
>
> Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip
> tree, need to make sure to document this as well.
Sure, I will send a separate patch to fix the same, once this gets in.
>
> 2. MAX_USER_VA_BITS
> Does makedumpfile care about userspace VA bits?
Yes. Consider the case 48-bit kernel VA and 52-bit user-space VA, which
is a perfectly valid case on arm64. In such cases VA_BITS is set to 48,
whereas MAX_USER_VA_BITS is set to 52, which allows user-space
applications which use 52-bit virtual address to pass a hint to 'mmap'
to get high addresses.
I do not see other code
> doing this, Kazu and Dave A should be able to comment.
I talked to Dave A. yesterday off-list, I think he mentioned that these
changes are useful for crash-utility as well and he was hoping it gets
accepted soon so that kernel-debugging tools can handle increased
address spaces on arm64.
But it will be great to have reviews/ACKs from Dave A and others as well.
Thanks,
Bhupesh
>
> I tend to doubt about this.
>
>>
>> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: James Morse <james.morse@arm.com>
>> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
>> ---
>> arch/arm64/kernel/crash_core.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c
>> index ca4c3e12d8c5..ad231be5c0d8 100644
>> --- a/arch/arm64/kernel/crash_core.c
>> +++ b/arch/arm64/kernel/crash_core.c
>> @@ -10,6 +10,8 @@
>> void arch_crash_save_vmcoreinfo(void)
>> {
>> VMCOREINFO_NUMBER(VA_BITS);
>> + VMCOREINFO_NUMBER(MAX_USER_VA_BITS);
>> + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
>> /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
>> vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
>> kimage_voffset);
>> --
>> 2.7.4
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
>
> Thanks
> Dave
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-04 14:35 ` Bhupesh Sharma@ 2019-02-04 15:31 ` Robin Murphy
2019-02-12 4:55 ` Bhupesh Sharma
2019-02-04 16:56 ` James Morse1 sibling, 1 reply; 29+ messages in thread
From: Robin Murphy @ 2019-02-04 15:31 UTC (permalink / raw)
To: Bhupesh Sharma, James Morse
Cc: Mark Rutland, ard.biesheuvel, catalin.marinas, kexec,
Will Deacon, AKASHI Takahiro, bhupesh.linux, linux-arm-kernel
On 04/02/2019 14:35, Bhupesh Sharma wrote:
[...]
>> Also hardcoding the PTE calculation to use the high address bit mask
>> always will break the backward compatibility with older kernels (which
>> don't support 52-bit address space extensions).
No it won't. There's no difference between an old kernel, a new kernel
on a CPU without ARMv8.2-LPA, or a new kernel on a CPU with ARMv8.2-LPA
in a system which happens to have less than 49 bits of physical memory
map - in all those cases the relevant bits are either RES0 or just
actually 0 in the PTE, so replacing 4 bits of zeros with 4 bits of other
zeros in the final physical address has no effect whatsoever other than
taking a couple of extra instructions to perform.
If you're running a 64K page kernel on a system with an SMMU, note how
that's already been "broken" for nearly a year now ;)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/io-pgtable-arm.c#n211
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-04 14:35 ` Bhupesh Sharma
2019-02-04 15:31 ` Robin Murphy@ 2019-02-04 16:56 ` James Morse1 sibling, 0 replies; 29+ messages in thread
From: James Morse @ 2019-02-04 16:56 UTC (permalink / raw)
To: Bhupesh Sharma
Cc: Mark Rutland, ard.biesheuvel, catalin.marinas, kexec,
Will Deacon, AKASHI Takahiro, bhupesh.linux, linux-arm-kernel
Hi Bhupesh,
On 04/02/2019 14:35, Bhupesh Sharma wrote:
> On 01/31/2019 03:09 AM, Bhupesh Sharma wrote:
>> On 01/30/2019 08:51 PM, James Morse wrote:
>>> On 01/30/2019 12:23 PM, Bhupesh Sharma wrote:
>>>> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which
>>>> supports these extensions can support upto 52-bit virtual and 52-bit
>>>> physical addresses respectively.
>>>>
>>>> Since at the moment we enable the support of these extensions via CONFIG
>>>> flags, e.g.
>>>> - LPA via CONFIG_ARM64_PA_BITS_52
>>>>
>>>> there are no clear mechanisms in user-space right now to
>>>> deteremine these CONFIG flag values and also determine the PARange and
>>>> VARange address values.
>>>> User-space tools like 'makedumpfile' and 'crash-utility' can instead
>>>> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine
>>>> the maximum virtual address and physical address (respectively)
>>>> supported by underlying kernel.
>>>>
>>>> A reference 'makedumpfile' implementation which uses this approach to
>>>> determining the maximum physical address is available in [0].
>>>
>>> Why does it need to know?
>>>
>>> (Suzuki asked the same question on your earlier version)
>>> https://lore.kernel.org/linux-arm-kernel/cff44754-7fe4-efea-bc8e-4dde2277c821@arm.com/>> I have shared some details (after discussion with our test teams) in reply to
>> the review comments from Suzuki here:
>> http://lists.infradead.org/pipermail/kexec/2019-January/022389.html, and
>> http://lists.infradead.org/pipermail/kexec/2019-January/022390.html
>>
>> Just to summarize, I mentioned in my replies to the review comments tha the
>> makedumpfile implementation (for decoding the PTE) was just as an example,
>> however there can be other user-space applications, for e.g a user-space
>> application running with 48-bit kernel VA and 52-bit user space VA and
>> requesting allocation in 'high' address via a 'hint' to mmap.
But vmcoreinfo is the wrong place to expose this information. (it can be
configured off, and is only accessible to root)
>>> From your github link it looks like you use this to re-assemble the two bits
>>> of the PFN from the pte. Can't you always do this for 64K pages? CPUs with
>>> the feature always do this too, its not something the kernel turns on.
>>
>> Ok, let me try to give some perspective of a common makedumpfile use-case
>> before I jump into the details:
>> Also hardcoding the PTE calculation to use the high address bit mask always
>> will break the backward compatibility with older kernels (which don't support
>> 52-bit address space extensions).
What would go wrong?
The hardware ignores those bits and supplies zero. As far as I can see the
kernel has always generated zero there.
>> (b). Also x86_64 already has a vmcoreinfo export for 'pgtable_l5_enabled':
So? 5-level page tables is a different feature. I agree you need to know the
number of levels to walk the page-tables, but that isn't how arm64's 52bit stuff
works.
> Ping. Since this patch fixes a regression with user-space tools like
> makedumpfile and crash-utility which are broken since arm64 kernels with 52-bit
> VA and PA support are available (and distributions which enable them), would
> request review comments/ack on this simple change.
Broken how? What goes wrong?
I can see how a not-52bit-aware crash/gdb/whatever would be confused by high
bits being set in the physical address, and possibly throw them away.
But once it supports this for 64K pages, I don't see what can go wrong if those
bits aren't set. Why does it need to know?
Thanks,
James
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-04 15:31 ` Robin Murphy@ 2019-02-12 4:55 ` Bhupesh Sharma
2019-02-12 10:49 ` Robin Murphy0 siblings, 1 reply; 29+ messages in thread
From: Bhupesh Sharma @ 2019-02-12 4:55 UTC (permalink / raw)
To: Robin Murphy, James Morse
Cc: Mark Rutland, ard.biesheuvel, catalin.marinas, kexec,
Will Deacon, AKASHI Takahiro, bhupesh.linux, linux-arm-kernel
Hi Robin,
On 02/04/2019 09:01 PM, Robin Murphy wrote:
> On 04/02/2019 14:35, Bhupesh Sharma wrote:
> [...]
>>> Also hardcoding the PTE calculation to use the high address bit mask
>>> always will break the backward compatibility with older kernels
>>> (which don't support 52-bit address space extensions).
>
> No it won't. There's no difference between an old kernel, a new kernel
> on a CPU without ARMv8.2-LPA, or a new kernel on a CPU with ARMv8.2-LPA
> in a system which happens to have less than 49 bits of physical memory
> map - in all those cases the relevant bits are either RES0 or just
> actually 0 in the PTE, so replacing 4 bits of zeros with 4 bits of other
> zeros in the final physical address has no effect whatsoever other than
> taking a couple of extra instructions to perform.
Right, but lets think this from a distribution user p-o-v. Why would a
user with newer kernel and CPU which doesn't support ARMv8.2 LPA want to
update a user-space utility? Well unless something is broken in the
user-space. Requesting an upgrade is easier in this case, as the
user-space components like crash-utility/kexec-tools (or for that matter
any user-space tool which requires a page-table walk to determine
vaddr/paddr values) are indeed broken with this combination.
On the other hand, if a user a having a old kernel, CPU which doesn't
support LPA, but if we still want them to upgrade to a newer version of
user-space utility (with a upgrade fix note reading something like: "Add
52-bit PA support"), surely that would not be an ideal case requiring an
upgrade as his underlying environment doesn't support the same.
That's what I meant when I said that we need to take care of the
backward compatibility also when we propose new code changes to
user-space packages.
Thanks,
Bhupesh
> If you're running a 64K page kernel on a system with an SMMU, note how
> that's already been "broken" for nearly a year now ;)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/io-pgtable-arm.c#n211
>
>
>
> Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-04 16:04 ` Kazuhito Hagio@ 2019-02-12 5:07 ` Bhupesh Sharma
2019-02-12 10:44 ` Dave Young0 siblings, 1 reply; 29+ messages in thread
From: Bhupesh Sharma @ 2019-02-12 5:07 UTC (permalink / raw)
To: Kazuhito Hagio, Dave Young
Cc: Mark Rutland, lijiang, bhe, ard.biesheuvel, catalin.marinas,
kexec, Will Deacon, AKASHI Takahiro, James Morse,
Borislav Petkov, anderson, linux-arm-kernel
Hi Kazu,
On 02/04/2019 09:34 PM, Kazuhito Hagio wrote:
> On 1/30/2019 8:48 PM, Dave Young wrote:
>> + more people
>> On 01/30/19 at 05:53pm, Bhupesh Sharma wrote:
>>> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which
>>> supports these extensions can support upto 52-bit virtual and 52-bit
>>> physical addresses respectively.
>>>
>>> Since at the moment we enable the support of these extensions via CONFIG
>>> flags, e.g.
>>> - LPA via CONFIG_ARM64_PA_BITS_52
>>>
>>> there are no clear mechanisms in user-space right now to
>>> deteremine these CONFIG flag values and also determine the PARange and
>>> VARange address values.
>>>
>>> User-space tools like 'makedumpfile' and 'crash-utility' can instead
>>> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine
>>> the maximum virtual address and physical address (respectively)
>>> supported by underlying kernel.
>>>
>>> A reference 'makedumpfile' implementation which uses this approach to
>>> determining the maximum physical address is available in [0].
>>>
>>> [0].
>> https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490
>>
>> I'm not objecting the patch, just want to make sure to make clear about
>> things and make sure these issues are aware by people, and leave arm
>> people to review the arm bits.
>>
>> 1. MAX_PHYSMEM_BITS
>> As we previously found, back to 2014 makedumpfile took a patch to read the
>> value from vmcore but the kernel patch was not accepted.
>> So we should first make clear if this is really needed, why other arches
>> do not need this in makedumpfile.
>>
>> If we really need it then should it be arm64 only?
>>
>> If it is arm64 only then the makedumpfile code should read this number
>> only for arm64.
>
> Sorry for the delay.
>
> According to the kernel patch, some of arm32 platforms may need it
> http://lists.infradead.org/pipermail/kexec/2014-May/011909.html
> but except for them (and arm64), makedumpfile can manage with kernel
> version and some switches to determine this value so far.
>
>>
>> Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip
>> tree, need to make sure to document this as well.
>>
>> 2. MAX_USER_VA_BITS
>> Does makedumpfile care about userspace VA bits? I do not see other code
>> doing this, Kazu and Dave A should be able to comment.
>
> The mapping makedumpfile uses on arm64 is swapper_pg_dir only, so
> unless the config affects its structure or something, makedumpfile
> will not need this value.
I captured this case in more details while sending out the makedumpfile
enablement patch for ARMv8.2-LVA (see [0]), but here is a brief summary
on the same:
Since at the moment we enable the support of the ARMv8.2-LVA extension
for 52-bit user-space VA in the kernel via a CONFIG flags
(CONFIG_ARM64_USER_VA_BITS_52), so there are no clear mechanisms in
user-space to determine this CONFIG
flag value and use it to determine the address range values.
Since 'VA_BITS' are already exported via vmcoreinfo, if we export
'MAX_USER_VA_BITS' as well, we can use the same in user-space to check
if the 'MAX_USER_VA_BITS' value is greater than 'VA_BITS'. If yes, then
we are running a use-case where user-space is 52-bit while the
underlying kernel is still 48-bit.
The increased 'PTRS_PER_PGD' value for such cases needs to be then
calculated as is done by the underlying kernel (see
'arch/arm64/include/asm/pgtable-hwdef.h' for details):
#define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
Also, note that 'arch/arm64/include/asm/memory.h' defines
'MAX_USER_VA_BITS' as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52'
is set to 'n':
#ifdef CONFIG_ARM64_USER_VA_BITS_52
#define MAX_USER_VA_BITS 52
#else
#define MAX_USER_VA_BITS VA_BITS
#endif
So, makedumpfile will need this symbol exported in vmcore to make the
above determination.
[0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html
Thanks,
Bhupesh
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-12 5:07 ` Bhupesh Sharma@ 2019-02-12 10:44 ` Dave Young
2019-02-12 19:59 ` Bhupesh Sharma0 siblings, 1 reply; 29+ messages in thread
From: Dave Young @ 2019-02-12 10:44 UTC (permalink / raw)
To: Bhupesh Sharma
Cc: Mark Rutland, Kazuhito Hagio, lijiang, bhe, ard.biesheuvel,
catalin.marinas, kexec, Will Deacon, AKASHI Takahiro,
James Morse, Borislav Petkov, anderson, linux-arm-kernel
On 02/12/19 at 10:37am, Bhupesh Sharma wrote:
> Hi Kazu,
>
> On 02/04/2019 09:34 PM, Kazuhito Hagio wrote:
> > On 1/30/2019 8:48 PM, Dave Young wrote:
> > > + more people
> > > On 01/30/19 at 05:53pm, Bhupesh Sharma wrote:
> > > > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which
> > > > supports these extensions can support upto 52-bit virtual and 52-bit
> > > > physical addresses respectively.
> > > >
> > > > Since at the moment we enable the support of these extensions via CONFIG
> > > > flags, e.g.
> > > > - LPA via CONFIG_ARM64_PA_BITS_52
> > > >
> > > > there are no clear mechanisms in user-space right now to
> > > > deteremine these CONFIG flag values and also determine the PARange and
> > > > VARange address values.
> > > >
> > > > User-space tools like 'makedumpfile' and 'crash-utility' can instead
> > > > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine
> > > > the maximum virtual address and physical address (respectively)
> > > > supported by underlying kernel.
> > > >
> > > > A reference 'makedumpfile' implementation which uses this approach to
> > > > determining the maximum physical address is available in [0].
> > > >
> > > > [0].
> > > https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490
> > >
> > > I'm not objecting the patch, just want to make sure to make clear about
> > > things and make sure these issues are aware by people, and leave arm
> > > people to review the arm bits.
> > >
> > > 1. MAX_PHYSMEM_BITS
> > > As we previously found, back to 2014 makedumpfile took a patch to read the
> > > value from vmcore but the kernel patch was not accepted.
> > > So we should first make clear if this is really needed, why other arches
> > > do not need this in makedumpfile.
> > >
> > > If we really need it then should it be arm64 only?
> > >
> > > If it is arm64 only then the makedumpfile code should read this number
> > > only for arm64.
> >
> > Sorry for the delay.
> >
> > According to the kernel patch, some of arm32 platforms may need it
> > http://lists.infradead.org/pipermail/kexec/2014-May/011909.html
> > but except for them (and arm64), makedumpfile can manage with kernel
> > version and some switches to determine this value so far.
> >
> > >
> > > Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip
> > > tree, need to make sure to document this as well.
> > >
> > > 2. MAX_USER_VA_BITS
> > > Does makedumpfile care about userspace VA bits? I do not see other code
> > > doing this, Kazu and Dave A should be able to comment.
> >
> > The mapping makedumpfile uses on arm64 is swapper_pg_dir only, so
> > unless the config affects its structure or something, makedumpfile
> > will not need this value.
>
> I captured this case in more details while sending out the makedumpfile
> enablement patch for ARMv8.2-LVA (see [0]), but here is a brief summary on
> the same:
>
> Since at the moment we enable the support of the ARMv8.2-LVA extension for
> 52-bit user-space VA in the kernel via a CONFIG flags
> (CONFIG_ARM64_USER_VA_BITS_52), so there are no clear mechanisms in
> user-space to determine this CONFIG
> flag value and use it to determine the address range values.
>
> Since 'VA_BITS' are already exported via vmcoreinfo, if we export
> 'MAX_USER_VA_BITS' as well, we can use the same in user-space to check if
> the 'MAX_USER_VA_BITS' value is greater than 'VA_BITS'. If yes, then we are
> running a use-case where user-space is 52-bit while the underlying kernel is
> still 48-bit.
Problem is why this is needed, it sounds like you are talking about some
non-exist use case.
>
> The increased 'PTRS_PER_PGD' value for such cases needs to be then
> calculated as is done by the underlying kernel (see
> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
>
> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
>
> Also, note that 'arch/arm64/include/asm/memory.h' defines 'MAX_USER_VA_BITS'
> as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52' is set to 'n':
>
> #ifdef CONFIG_ARM64_USER_VA_BITS_52
> #define MAX_USER_VA_BITS 52
> #else
> #define MAX_USER_VA_BITS VA_BITS
> #endif
>
> So, makedumpfile will need this symbol exported in vmcore to make the above
> determination.
>
> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html
>
> Thanks,
> Bhupesh
Thanks
Dave
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-12 4:55 ` Bhupesh Sharma@ 2019-02-12 10:49 ` Robin Murphy0 siblings, 0 replies; 29+ messages in thread
From: Robin Murphy @ 2019-02-12 10:49 UTC (permalink / raw)
To: Bhupesh Sharma, James Morse
Cc: Mark Rutland, ard.biesheuvel, catalin.marinas, kexec,
Will Deacon, AKASHI Takahiro, bhupesh.linux, linux-arm-kernel
On 12/02/2019 04:55, Bhupesh Sharma wrote:
> Hi Robin,
>
> On 02/04/2019 09:01 PM, Robin Murphy wrote:
>> On 04/02/2019 14:35, Bhupesh Sharma wrote:
>> [...]
>>>> Also hardcoding the PTE calculation to use the high address bit mask
>>>> always will break the backward compatibility with older kernels
>>>> (which don't support 52-bit address space extensions).
>>
>> No it won't. There's no difference between an old kernel, a new kernel
>> on a CPU without ARMv8.2-LPA, or a new kernel on a CPU with
>> ARMv8.2-LPA in a system which happens to have less than 49 bits of
>> physical memory map - in all those cases the relevant bits are either
>> RES0 or just actually 0 in the PTE, so replacing 4 bits of zeros with
>> 4 bits of other zeros in the final physical address has no effect
>> whatsoever other than taking a couple of extra instructions to perform.
>
> Right, but lets think this from a distribution user p-o-v. Why would a
> user with newer kernel and CPU which doesn't support ARMv8.2 LPA want to
> update a user-space utility? Well unless something is broken in the
> user-space. Requesting an upgrade is easier in this case, as the
> user-space components like crash-utility/kexec-tools (or for that matter
> any user-space tool which requires a page-table walk to determine
> vaddr/paddr values) are indeed broken with this combination.
Huh? Nothing gets broken unless we go out of our way to break it. The
point I'm making is that the format of vmcoreinfo *does not* need to
change in order for the userspace tool to support LPA. There is no
compatibility issue either way. If a user's machine doesn't support or
make use of LPA, then yeah, they should absolutely not have to upgrade
their tools for the sake of LPA regardless of what kernel they use.
> On the other hand, if a user a having a old kernel, CPU which doesn't
> support LPA, but if we still want them to upgrade to a newer version of
> user-space utility (with a upgrade fix note reading something like: "Add
> 52-bit PA support"), surely that would not be an ideal case requiring an
> upgrade as his underlying environment doesn't support the same.
I don't get why we would "still want them to upgrade" if there's no
technical reason for them to do so. That sounds like the kind of crap
that proprietary OS vendors pull :/
> That's what I meant when I said that we need to take care of the
> backward compatibility also when we propose new code changes to
> user-space packages.
And what I meant is that you can trivially update the userspace tools to
support LPA entirely transparently, without requiring kernel changes or
creating any compatibility issues in any direction.
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-13 11:15 ` Dave Young@ 2019-02-13 18:22 ` James Morse
2019-02-13 19:52 ` Kazuhito Hagio
2019-02-14 19:30 ` Bhupesh Sharma0 siblings, 2 replies; 29+ messages in thread
From: James Morse @ 2019-02-13 18:22 UTC (permalink / raw)
To: Dave Young
Cc: Mark Rutland, Kazuhito Hagio, lijiang, bhe, ard.biesheuvel,
catalin.marinas, Bhupesh Sharma, kexec, Will Deacon,
AKASHI Takahiro, anderson, Borislav Petkov, linux-arm-kernel
Hi guys,
On 13/02/2019 11:15, Dave Young wrote:
> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
>>> mentioned that the changes look necessary.
>>>
>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
>>
>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
>>>>> calculated as is done by the underlying kernel
Aha! Nothing to do with which-bits-are-pfn in the tables...
You need to know if the top level PGD is 512bytes or bigger. As we use a
kmem-cache the adjacent data could be some else's page tables.
Is this really a problem though? You can't pull the user-space pgd pointers out
of no-where, you must have walked some task_struct and struct_mm's to find them.
In which case you would have the VMAs on hand to tell you if its in the mapped
user range.
It would be good to avoid putting something arch-specific in here if we can at
all help it.
>>>>> (see
>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
>>>>>
>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
>>
>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
>> It is used for pgd_index() also in makedumpfile to walk page tables.
>>
>> /* to find an entry in a page-table-directory */
>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
>
> Since Dave mentioned crash tool does not need it, but crash should also
> travel the pg tables.
>
> If this is really necessary it would be good to describe what will
> happen without the patch, eg. some user visible error from an actual test etc.
Yes please, it would really help if there was a specific example we could discuss.
Thanks,
James
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*RE: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-13 18:22 ` James Morse@ 2019-02-13 19:52 ` Kazuhito Hagio
2019-02-15 17:34 ` James Morse
2019-02-14 19:30 ` Bhupesh Sharma1 sibling, 1 reply; 29+ messages in thread
From: Kazuhito Hagio @ 2019-02-13 19:52 UTC (permalink / raw)
To: James Morse, Dave Young, Bhupesh Sharma
Cc: Mark Rutland, lijiang, bhe, ard.biesheuvel, catalin.marinas,
kexec, Will Deacon, AKASHI Takahiro, anderson, Borislav Petkov,
linux-arm-kernel
On 2/13/2019 1:22 PM, James Morse wrote:
> Hi guys,
>
> On 13/02/2019 11:15, Dave Young wrote:
> > On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
> >> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
> >>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
> >>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
> >>> mentioned that the changes look necessary.
> >>>
> >>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
> >>
> >>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
> >>>>> calculated as is done by the underlying kernel
>
> Aha! Nothing to do with which-bits-are-pfn in the tables...
>
> You need to know if the top level PGD is 512bytes or bigger. As we use a
> kmem-cache the adjacent data could be some else's page tables.
>
> Is this really a problem though? You can't pull the user-space pgd pointers out
> of no-where, you must have walked some task_struct and struct_mm's to find them.
> In which case you would have the VMAs on hand to tell you if its in the mapped
> user range.
>
> It would be good to avoid putting something arch-specific in here if we can at
> all help it.
>
>
> >>>>> (see
> >>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
> >>>>>
> >>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
> >>
> >> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
> >> It is used for pgd_index() also in makedumpfile to walk page tables.
> >>
> >> /* to find an entry in a page-table-directory */
> >> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
> >
> > Since Dave mentioned crash tool does not need it, but crash should also
> > travel the pg tables.
The crash utility is always invoked with vmlinux, so it can read the
vabits_user variable directly from vmcore, but makedumpfile can not.
> > If this is really necessary it would be good to describe what will
> > happen without the patch, eg. some user visible error from an actual test etc.
>
> Yes please, it would really help if there was a specific example we could discuss.
With 52-bit user space and 48-bit kernel space configuration,
makedumpfile will not be able to convert a virtual kernel address
to a physical address, and fail to capture a dumpfile, because the
pgd_index() will return a wrong index.
But I don't have any suitable test system on hand, so have not tried
the kernel configuration actually. If found, I'll try.
Bhupesh, do you have any test result?
Thanks,
Kazu
>
>
> Thanks,
>
> James
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-13 19:52 ` Kazuhito Hagio@ 2019-02-15 17:34 ` James Morse
2019-02-15 18:01 ` Bhupesh Sharma0 siblings, 1 reply; 29+ messages in thread
From: James Morse @ 2019-02-15 17:34 UTC (permalink / raw)
To: Kazuhito Hagio, Bhupesh Sharma
Cc: Mark Rutland, lijiang, bhe, ard.biesheuvel, catalin.marinas,
Steve Capper, kexec, Will Deacon, AKASHI Takahiro, anderson,
Kristina Martsenko, Borislav Petkov, Dave Young,
linux-arm-kernel
Hi guys,
(CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU
config when 52-bit VA and pointer-auth may be in use?"
On 13/02/2019 19:52, Kazuhito Hagio wrote:
> On 2/13/2019 1:22 PM, James Morse wrote:
>> On 13/02/2019 11:15, Dave Young wrote:
>>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
>>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
>>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
>>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
>>>>> mentioned that the changes look necessary.
>>>>>
>>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
>>>>
>>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
>>>>>>> calculated as is done by the underlying kernel
>>
>> Aha! Nothing to do with which-bits-are-pfn in the tables...
>>
>> You need to know if the top level PGD is 512bytes or bigger. As we use a
>> kmem-cache the adjacent data could be some else's page tables.
>>
>> Is this really a problem though? You can't pull the user-space pgd pointers out
>> of no-where, you must have walked some task_struct and struct_mm's to find them.
>> In which case you would have the VMAs on hand to tell you if its in the mapped
>> user range.
>>
>> It would be good to avoid putting something arch-specific in here if we can at
>> all help it.
>>>>>>> (see
>>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
>>>>>>>
>>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
>>>>
>>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
>>>> It is used for pgd_index() also in makedumpfile to walk page tables.
>>>>
>>>> /* to find an entry in a page-table-directory */
>>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
>>>
>>> Since Dave mentioned crash tool does not need it, but crash should also
>>> travel the pg tables.
>
> The crash utility is always invoked with vmlinux, so it can read the
> vabits_user variable directly from vmcore, but makedumpfile can not.
(This sounds fragile. That symbol's name may change, it may disappear
completely! ... but I guess crash changes with every kernel release anyway)
>>> If this is really necessary it would be good to describe what will
>>> happen without the patch, eg. some user visible error from an actual test etc.
>>
>> Yes please, it would really help if there was a specific example we could discuss.
>
> With 52-bit user space and 48-bit kernel space configuration,
> makedumpfile will not be able to convert a virtual kernel address
> to a physical address, and fail to capture a dumpfile, because the
> pgd_index() will return a wrong index.
Got it, thanks!
(all this user stuff had me thinking it was user-space you were trying to walk).
Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow
52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and
walk it without knowing the offset you get junk.
Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if
the offsetting code disappears, the kernel would still have to provide
'ttbr1_offset=0' for user-space to keep working.
I'd like to find something future-proof that always has an unambiguous meaning,
and isn't a problem if the kernel variable/symbol/kconfig names change.
With pointer-auth in use too you can't guess which bits are address and which
bits are data.
Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a
problem if we ever switch that per task (some new bits may turn up with a new
feature). Some of those bits vary per cpu too, so we'd have to mask them out in
case user-space tries to conclude something from them.
My current best suggestion is to export:
from core code:
* USER_MMAP_END, the maximum value a user-space can try and mmap().
This would normally be TASK_SIZE, but x86 and powerpc also have support for
larger VA space, and its plumbed into mm slightly differently. We should have
one arch-independent property that covers all these. On arm64 this would be the
runtime va bits for user-space's TTBR. (This assumes the value isn't per-task)
arch specific:
* ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll
never flip user/kernel space). This has to be arch specific, it will always have
a value and its meaning comes from the ARM-ARM (so linux can't change it in the
future). It should be the same on every CPU.
* ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the
offset. Again this always has a value, and its meaning comes from the ARM-ARM.
If we ever get clever with different page-tables/TCR values on different CPUs,
these two should come from the same CPU.
I think this gives you what you need if user/kernel may both be using
pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can
be picked at boot time depending on the kernel kconfig and the hardware support.
Does anyone have a better idea? (or a corner where this won't work?)
Thanks,
James
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-15 17:34 ` James Morse@ 2019-02-15 18:01 ` Bhupesh Sharma
2019-02-18 15:27 ` Steve Capper
2019-02-19 20:47 ` Kazuhito Hagio0 siblings, 2 replies; 29+ messages in thread
From: Bhupesh Sharma @ 2019-02-15 18:01 UTC (permalink / raw)
To: James Morse
Cc: Mark Rutland, Kazuhito Hagio, lijiang, bhe, ard.biesheuvel,
catalin.marinas, Steve Capper, kexec, Will Deacon,
AKASHI Takahiro, anderson, Kristina Martsenko, Borislav Petkov,
Dave Young, linux-arm-kernel
Hi James,
On Fri, Feb 15, 2019 at 11:04 PM James Morse <james.morse@arm.com> wrote:
>
> Hi guys,
>
> (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU
> config when 52-bit VA and pointer-auth may be in use?"
>
> On 13/02/2019 19:52, Kazuhito Hagio wrote:
> > On 2/13/2019 1:22 PM, James Morse wrote:
> >> On 13/02/2019 11:15, Dave Young wrote:
> >>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
> >>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
> >>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
> >>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
> >>>>> mentioned that the changes look necessary.
> >>>>>
> >>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
> >>>>
> >>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
> >>>>>>> calculated as is done by the underlying kernel
> >>
> >> Aha! Nothing to do with which-bits-are-pfn in the tables...
> >>
> >> You need to know if the top level PGD is 512bytes or bigger. As we use a
> >> kmem-cache the adjacent data could be some else's page tables.
> >>
> >> Is this really a problem though? You can't pull the user-space pgd pointers out
> >> of no-where, you must have walked some task_struct and struct_mm's to find them.
> >> In which case you would have the VMAs on hand to tell you if its in the mapped
> >> user range.
> >>
> >> It would be good to avoid putting something arch-specific in here if we can at
> >> all help it.
>
> >>>>>>> (see
> >>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
> >>>>>>>
> >>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
> >>>>
> >>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
> >>>> It is used for pgd_index() also in makedumpfile to walk page tables.
> >>>>
> >>>> /* to find an entry in a page-table-directory */
> >>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
> >>>
> >>> Since Dave mentioned crash tool does not need it, but crash should also
> >>> travel the pg tables.
> >
> > The crash utility is always invoked with vmlinux, so it can read the
> > vabits_user variable directly from vmcore, but makedumpfile can not.
>
> (This sounds fragile. That symbol's name may change, it may disappear
> completely! ... but I guess crash changes with every kernel release anyway)
>
>
> >>> If this is really necessary it would be good to describe what will
> >>> happen without the patch, eg. some user visible error from an actual test etc.
> >>
> >> Yes please, it would really help if there was a specific example we could discuss.
> >
> > With 52-bit user space and 48-bit kernel space configuration,
> > makedumpfile will not be able to convert a virtual kernel address
> > to a physical address, and fail to capture a dumpfile, because the
> > pgd_index() will return a wrong index.
>
> Got it, thanks!
> (all this user stuff had me thinking it was user-space you were trying to walk).
>
> Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow
> 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and
> walk it without knowing the offset you get junk.
>
> Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if
> the offsetting code disappears, the kernel would still have to provide
> 'ttbr1_offset=0' for user-space to keep working.
>
> I'd like to find something future-proof that always has an unambiguous meaning,
> and isn't a problem if the kernel variable/symbol/kconfig names change.
>
> With pointer-auth in use too you can't guess which bits are address and which
> bits are data.
>
> Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a
> problem if we ever switch that per task (some new bits may turn up with a new
> feature). Some of those bits vary per cpu too, so we'd have to mask them out in
> case user-space tries to conclude something from them.
>
>
> My current best suggestion is to export:
> from core code:
> * USER_MMAP_END, the maximum value a user-space can try and mmap().
> This would normally be TASK_SIZE, but x86 and powerpc also have support for
> larger VA space, and its plumbed into mm slightly differently. We should have
> one arch-independent property that covers all these. On arm64 this would be the
> runtime va bits for user-space's TTBR. (This assumes the value isn't per-task)
>
> arch specific:
> * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll
> never flip user/kernel space). This has to be arch specific, it will always have
> a value and its meaning comes from the ARM-ARM (so linux can't change it in the
> future). It should be the same on every CPU.
> * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the
> offset. Again this always has a value, and its meaning comes from the ARM-ARM.
> If we ever get clever with different page-tables/TCR values on different CPUs,
> these two should come from the same CPU.
>
>
> I think this gives you what you need if user/kernel may both be using
> pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can
> be picked at boot time depending on the kernel kconfig and the hardware support.
>
> Does anyone have a better idea? (or a corner where this won't work?)
I am not sure you got a chance to look at the two regression cases I
reported here:
<http://lists.infradead.org/pipermail/kexec/2019-February/022449.html>
Unfortunately the above suggestion doesn't provide any fix for
ARMv8.2-LPA regression (see text under heading '
(1). Regression Case 1 (ARMv8.2-LPA enabled kernel)')
After going through the regression reports, I think exporting
'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient
for the above regressions (without over-complicating the stuff) as
ARM64_TCR.T1SZ and friends seem to arch specific as compared to
VA_BITS + 'MAX_USER_VA_BITS' .
Thanks,
Bhupesh
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-15 18:01 ` Bhupesh Sharma@ 2019-02-18 15:27 ` Steve Capper
2019-02-21 16:08 ` Bhupesh Sharma
2019-02-19 20:47 ` Kazuhito Hagio1 sibling, 1 reply; 29+ messages in thread
From: Steve Capper @ 2019-02-18 15:27 UTC (permalink / raw)
To: Bhupesh Sharma
Cc: Mark Rutland, Kazuhito Hagio, lijiang, bhe, ard.biesheuvel,
Catalin Marinas, kexec, Will Deacon, AKASHI Takahiro,
James Morse, Kristina Martsenko, Borislav Petkov, anderson, nd,
Dave Young, linux-arm-kernel
Hi Bhupesh,
Sorry for joining this thread late...
On Fri, Feb 15, 2019 at 11:31:56PM +0530, Bhupesh Sharma wrote:
> Hi James,
>
> On Fri, Feb 15, 2019 at 11:04 PM James Morse <james.morse@arm.com> wrote:
> >
> > Hi guys,
> >
> > (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU
> > config when 52-bit VA and pointer-auth may be in use?"
> >
> > On 13/02/2019 19:52, Kazuhito Hagio wrote:
> > > On 2/13/2019 1:22 PM, James Morse wrote:
> > >> On 13/02/2019 11:15, Dave Young wrote:
> > >>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
> > >>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
> > >>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
> > >>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
> > >>>>> mentioned that the changes look necessary.
> > >>>>>
> > >>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
> > >>>>
> > >>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
> > >>>>>>> calculated as is done by the underlying kernel
> > >>
> > >> Aha! Nothing to do with which-bits-are-pfn in the tables...
> > >>
> > >> You need to know if the top level PGD is 512bytes or bigger. As we use a
> > >> kmem-cache the adjacent data could be some else's page tables.
> > >>
> > >> Is this really a problem though? You can't pull the user-space pgd pointers out
> > >> of no-where, you must have walked some task_struct and struct_mm's to find them.
> > >> In which case you would have the VMAs on hand to tell you if its in the mapped
> > >> user range.
> > >>
> > >> It would be good to avoid putting something arch-specific in here if we can at
> > >> all help it.
> >
> > >>>>>>> (see
> > >>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
> > >>>>>>>
> > >>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
> > >>>>
> > >>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
> > >>>> It is used for pgd_index() also in makedumpfile to walk page tables.
> > >>>>
> > >>>> /* to find an entry in a page-table-directory */
> > >>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
> > >>>
> > >>> Since Dave mentioned crash tool does not need it, but crash should also
> > >>> travel the pg tables.
> > >
> > > The crash utility is always invoked with vmlinux, so it can read the
> > > vabits_user variable directly from vmcore, but makedumpfile can not.
> >
> > (This sounds fragile. That symbol's name may change, it may disappear
> > completely! ... but I guess crash changes with every kernel release anyway)
> >
> >
> > >>> If this is really necessary it would be good to describe what will
> > >>> happen without the patch, eg. some user visible error from an actual test etc.
> > >>
> > >> Yes please, it would really help if there was a specific example we could discuss.
> > >
> > > With 52-bit user space and 48-bit kernel space configuration,
> > > makedumpfile will not be able to convert a virtual kernel address
> > > to a physical address, and fail to capture a dumpfile, because the
> > > pgd_index() will return a wrong index.
> >
> > Got it, thanks!
> > (all this user stuff had me thinking it was user-space you were trying to walk).
> >
> > Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow
> > 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and
> > walk it without knowing the offset you get junk.
> >
> > Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if
> > the offsetting code disappears, the kernel would still have to provide
> > 'ttbr1_offset=0' for user-space to keep working.
> >
> > I'd like to find something future-proof that always has an unambiguous meaning,
> > and isn't a problem if the kernel variable/symbol/kconfig names change.
> >
> > With pointer-auth in use too you can't guess which bits are address and which
> > bits are data.
> >
> > Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a
> > problem if we ever switch that per task (some new bits may turn up with a new
> > feature). Some of those bits vary per cpu too, so we'd have to mask them out in
> > case user-space tries to conclude something from them.
> >
> >
> > My current best suggestion is to export:
> > from core code:
> > * USER_MMAP_END, the maximum value a user-space can try and mmap().
> > This would normally be TASK_SIZE, but x86 and powerpc also have support for
> > larger VA space, and its plumbed into mm slightly differently. We should have
> > one arch-independent property that covers all these. On arm64 this would be the
> > runtime va bits for user-space's TTBR. (This assumes the value isn't per-task)
> >
> > arch specific:
> > * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll
> > never flip user/kernel space). This has to be arch specific, it will always have
> > a value and its meaning comes from the ARM-ARM (so linux can't change it in the
> > future). It should be the same on every CPU.
> > * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the
> > offset. Again this always has a value, and its meaning comes from the ARM-ARM.
> > If we ever get clever with different page-tables/TCR values on different CPUs,
> > these two should come from the same CPU.
> >
> >
> > I think this gives you what you need if user/kernel may both be using
> > pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can
> > be picked at boot time depending on the kernel kconfig and the hardware support.
> >
> > Does anyone have a better idea? (or a corner where this won't work?)
>
> I am not sure you got a chance to look at the two regression cases I
> reported here:
> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html>
>
> Unfortunately the above suggestion doesn't provide any fix for
> ARMv8.2-LPA regression (see text under heading '
> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)')
>
> After going through the regression reports, I think exporting
> 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient
> for the above regressions (without over-complicating the stuff) as
> ARM64_TCR.T1SZ and friends seem to arch specific as compared to
> VA_BITS + 'MAX_USER_VA_BITS' .
>
For MAX_USER_VA_BITS, IIUC you are just after a value of PTRS_PER_PGD?
Why not just add PTRS_PER_PGD to the vmcoreinfo?
FWIW it is possible in vaddr_to_paddr_arm64 to detect a zero pgd entry
then try again with another ptrs_per_pgd value (granted this is a little
hacky).
Cheers,
--
Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-18 15:27 ` Steve Capper@ 2019-02-21 16:08 ` Bhupesh Sharma0 siblings, 0 replies; 29+ messages in thread
From: Bhupesh Sharma @ 2019-02-21 16:08 UTC (permalink / raw)
To: Steve Capper
Cc: Mark Rutland, Kazuhito Hagio, lijiang, bhe, ard.biesheuvel,
Catalin Marinas, kexec, Will Deacon, AKASHI Takahiro,
James Morse, Kristina Martsenko, Borislav Petkov, anderson, nd,
Dave Young, linux-arm-kernel
Hi Steve,
On 02/18/2019 08:57 PM, Steve Capper wrote:
> Hi Bhupesh,
>
> Sorry for joining this thread late...
>
> On Fri, Feb 15, 2019 at 11:31:56PM +0530, Bhupesh Sharma wrote:
>> Hi James,
>>
>> On Fri, Feb 15, 2019 at 11:04 PM James Morse <james.morse@arm.com> wrote:
>>>
>>> Hi guys,
>>>
>>> (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU
>>> config when 52-bit VA and pointer-auth may be in use?"
>>>
>>> On 13/02/2019 19:52, Kazuhito Hagio wrote:
>>>> On 2/13/2019 1:22 PM, James Morse wrote:
>>>>> On 13/02/2019 11:15, Dave Young wrote:
>>>>>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
>>>>>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
>>>>>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
>>>>>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
>>>>>>>> mentioned that the changes look necessary.
>>>>>>>>
>>>>>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
>>>>>>>
>>>>>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
>>>>>>>>>> calculated as is done by the underlying kernel
>>>>>
>>>>> Aha! Nothing to do with which-bits-are-pfn in the tables...
>>>>>
>>>>> You need to know if the top level PGD is 512bytes or bigger. As we use a
>>>>> kmem-cache the adjacent data could be some else's page tables.
>>>>>
>>>>> Is this really a problem though? You can't pull the user-space pgd pointers out
>>>>> of no-where, you must have walked some task_struct and struct_mm's to find them.
>>>>> In which case you would have the VMAs on hand to tell you if its in the mapped
>>>>> user range.
>>>>>
>>>>> It would be good to avoid putting something arch-specific in here if we can at
>>>>> all help it.
>>>
>>>>>>>>>> (see
>>>>>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
>>>>>>>>>>
>>>>>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
>>>>>>>
>>>>>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
>>>>>>> It is used for pgd_index() also in makedumpfile to walk page tables.
>>>>>>>
>>>>>>> /* to find an entry in a page-table-directory */
>>>>>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
>>>>>>
>>>>>> Since Dave mentioned crash tool does not need it, but crash should also
>>>>>> travel the pg tables.
>>>>
>>>> The crash utility is always invoked with vmlinux, so it can read the
>>>> vabits_user variable directly from vmcore, but makedumpfile can not.
>>>
>>> (This sounds fragile. That symbol's name may change, it may disappear
>>> completely! ... but I guess crash changes with every kernel release anyway)
>>>
>>>
>>>>>> If this is really necessary it would be good to describe what will
>>>>>> happen without the patch, eg. some user visible error from an actual test etc.
>>>>>
>>>>> Yes please, it would really help if there was a specific example we could discuss.
>>>>
>>>> With 52-bit user space and 48-bit kernel space configuration,
>>>> makedumpfile will not be able to convert a virtual kernel address
>>>> to a physical address, and fail to capture a dumpfile, because the
>>>> pgd_index() will return a wrong index.
>>>
>>> Got it, thanks!
>>> (all this user stuff had me thinking it was user-space you were trying to walk).
>>>
>>> Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow
>>> 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and
>>> walk it without knowing the offset you get junk.
>>>
>>> Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if
>>> the offsetting code disappears, the kernel would still have to provide
>>> 'ttbr1_offset=0' for user-space to keep working.
>>>
>>> I'd like to find something future-proof that always has an unambiguous meaning,
>>> and isn't a problem if the kernel variable/symbol/kconfig names change.
>>>
>>> With pointer-auth in use too you can't guess which bits are address and which
>>> bits are data.
>>>
>>> Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a
>>> problem if we ever switch that per task (some new bits may turn up with a new
>>> feature). Some of those bits vary per cpu too, so we'd have to mask them out in
>>> case user-space tries to conclude something from them.
>>>
>>>
>>> My current best suggestion is to export:
>>> from core code:
>>> * USER_MMAP_END, the maximum value a user-space can try and mmap().
>>> This would normally be TASK_SIZE, but x86 and powerpc also have support for
>>> larger VA space, and its plumbed into mm slightly differently. We should have
>>> one arch-independent property that covers all these. On arm64 this would be the
>>> runtime va bits for user-space's TTBR. (This assumes the value isn't per-task)
>>>
>>> arch specific:
>>> * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll
>>> never flip user/kernel space). This has to be arch specific, it will always have
>>> a value and its meaning comes from the ARM-ARM (so linux can't change it in the
>>> future). It should be the same on every CPU.
>>> * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the
>>> offset. Again this always has a value, and its meaning comes from the ARM-ARM.
>>> If we ever get clever with different page-tables/TCR values on different CPUs,
>>> these two should come from the same CPU.
>>>
>>>
>>> I think this gives you what you need if user/kernel may both be using
>>> pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can
>>> be picked at boot time depending on the kernel kconfig and the hardware support.
>>>
>>> Does anyone have a better idea? (or a corner where this won't work?)
>>
>> I am not sure you got a chance to look at the two regression cases I
>> reported here:
>> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html>
>>
>> Unfortunately the above suggestion doesn't provide any fix for
>> ARMv8.2-LPA regression (see text under heading '
>> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)')
>>
>> After going through the regression reports, I think exporting
>> 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient
>> for the above regressions (without over-complicating the stuff) as
>> ARM64_TCR.T1SZ and friends seem to arch specific as compared to
>> VA_BITS + 'MAX_USER_VA_BITS' .
>>
>
> For MAX_USER_VA_BITS, IIUC you are just after a value of PTRS_PER_PGD?
> Why not just add PTRS_PER_PGD to the vmcoreinfo?
That's a good suggestion. I will re-spin the v2 with the same.
> FWIW it is possible in vaddr_to_paddr_arm64 to detect a zero pgd entry
> then try again with another ptrs_per_pgd value (granted this is a little
> hacky).
Right, but having this hack replicated across various user-space tools
is perhaps not the ideal portable solution, when we can simply add a
valid hint in the vmcoreinfo itself.
Thanks,
Bhupesh
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread

*Re: [PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo
2019-02-19 20:47 ` Kazuhito Hagio@ 2019-02-21 16:20 ` Bhupesh Sharma
2019-02-21 16:42 ` Dave Anderson0 siblings, 1 reply; 29+ messages in thread
From: Bhupesh Sharma @ 2019-02-21 16:20 UTC (permalink / raw)
To: Kazuhito Hagio
Cc: Mark Rutland, lijiang, bhe, ard.biesheuvel, catalin.marinas,
Steve Capper, kexec, Will Deacon, AKASHI Takahiro, James Morse,
Kristina Martsenko, Borislav Petkov, anderson, Dave Young,
linux-arm-kernel
Hi Kazu,
On 02/20/2019 02:17 AM, Kazuhito Hagio wrote:
> Hi Bhupesh,
>
> -----Original Message-----
>> I am not sure you got a chance to look at the two regression cases I
>> reported here:
>> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html>
>>
>> Unfortunately the above suggestion doesn't provide any fix for
>> ARMv8.2-LPA regression (see text under heading '
>> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)')
>
> As for MAX_PHYSMEM_BITS, I realized that ppc64 makedumpfile can detect
> it because there is only one SECTION_SIZE_BITS for ppc64. I think we
> can use the same way as set_ppc64_max_physmem_bits() does also for
> arm64 for now. I'm going to write it for kernels not having
> NUMBER(MAX_PHYSMEM_BITS) in vmcoreinfo.
I see two drawbacks with the above approach:
a). This means that other user-space tools like crash-utility would
still be broken and would probably need to find MAX_PHYSMEM_BITS for
arm64 via a similar (hack'ish ?) approach.
b). I am looking at the makedumpfile code for 'MAX_PHYSMEM_BITS'
determination for two archs as an example:
ppc
---
int
set_ppc64_max_physmem_bits(void)
{
long array_len = ARRAY_LENGTH(mem_section);
/*
* The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the
* newer kernels 3.7 onwards uses 46 bits.
*/
info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG ;
if ((array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT())))
return TRUE;
info->max_physmem_bits = _MAX_PHYSMEM_BITS_3_7;
if ((array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT())))
return TRUE;
info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_19;
if ((array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT())))
return TRUE;
info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_20;
if ((array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() /
_SECTIONS_PER_ROOT())))
return TRUE;
return FALSE;
}
x86_64:
------
int
get_versiondep_info_x86_64(void)
{
/*
* On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40.
*/
if (info->kernel_version < KERNEL_VERSION(2, 6, 26))
info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG;
else if (info->kernel_version < KERNEL_VERSION(2, 6, 31))
info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26;
else if(check_5level_paging())
info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL;
else
info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31;
...
}
Looking at the above, two questions come to my mind:
- Do we really need all the above complexity in user-space code, to hoop
across various kernel versions and perform allocations for something
that can be so easily exported via vmcoreinfo? Also we need to see how
portable is the above code for a new kernel version - IMO, it will need
another fix patch when we update to a new kernel version in near future.
- Also do we need to replicate the above implementations across
user-space tools when they can also utilize the vmcoreinfo information
to determine the PA_BITS range without any additional arch/kernel
version specific details as the single point of obtaining this
information from the kernel?
So, in view of the above, I would still advocate that we use a
vmcoreinfo export for 'MAX_PHYSMEM_BITS' as well to have a uniform
interface for the same across all user-land applications.
Thanks,
Bhupesh
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel^permalinkrawreply [flat|nested] 29+ messages in thread