*Re: [RFC PATCH v2 18/27] x86/cet/shstk: Introduce WRUSS instruction
2018-07-12 22:59 ` Yu-cheng Yu@ 2018-07-12 23:49 ` Dave Hansen
2018-07-13 1:50 ` Dave Hansen0 siblings, 1 reply; 123+ messages in thread
From: Dave Hansen @ 2018-07-12 23:49 UTC (permalink / raw)
To: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
Arnd Bergmann, Andy Lutomirski, Balbir Singh, Cyrill Gorcunov,
Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
Peter Zijlstra, Ravi V. Shankar, Vedvyas Shanbhogue
On 07/12/2018 03:59 PM, Yu-cheng Yu wrote:
> On Tue, 2018-07-10 at 16:48 -0700, Dave Hansen wrote:
>>>
>>> +/*
>>> + * WRUSS is a kernel instrcution and but writes to user
>>> + * shadow stack memory. When a fault occurs, both
>>> + * X86_PF_USER and X86_PF_SHSTK are set.
>>> + */
>>> +static int is_wruss(struct pt_regs *regs, unsigned long error_code)
>>> +{
>>> + return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
>>> + (X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs));
>>> +}
>> I thought X86_PF_USER was set based on the mode in which the fault
>> occurred. Does this mean that the architecture of this bit is different
>> now?
>
> Yes.
>
>> That seems like something we need to call out if so. It also means we
>> need to update the SDM because some of the text is wrong.
>
> It needs to mention the WRUSS case.
Ugh. The documentation for this is not pretty. But, I guess this is
not fundamentally different from access to U=1 pages when SMAP is in
place and we've set EFLAGS.AC=1.
But, sheesh, we need to call this out really explicitly and make it
crystal clear what is going on.
We need to go through the page fault code very carefully and audit all
the X86_PF_USER spots and make sure there's no impact to those. SMAP
should mean that we already dealt with these, but we still need an audit.
The docs[1] are clear as mud on this though: "Page entry has user
privilege (U=1) for a supervisor-level shadow-stack-load,
shadow-stack-store-intent or shadow-stack-store access except those that
originate from the WRUSS instruction."
Or, in short:
"Page has U=1 ... except those that originate from the WRUSS
instruction."
Which is backwards from what you said. I really wish those docs had
reused the established SDM language instead of reinventing their own way
of saying things.
1.
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf^permalinkrawreply [flat|nested] 123+ messages in thread

*Re: [RFC PATCH v2 18/27] x86/cet/shstk: Introduce WRUSS instruction
2018-07-12 23:49 ` Dave Hansen@ 2018-07-13 1:50 ` Dave Hansen
2018-07-13 2:21 ` Andy Lutomirski0 siblings, 1 reply; 123+ messages in thread
From: Dave Hansen @ 2018-07-13 1:50 UTC (permalink / raw)
To: Dave Hansen, Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
Cyrill Gorcunov, Florian Weimer, H.J. Lu, Jann Horn,
Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
Oleg Nesterov, Pavel Machek, Peter Zijlstra, Ravi V. Shankar,
Vedvyas Shanbhogue
On 07/12/2018 04:49 PM, Dave Hansen wrote:
>>> That seems like something we need to call out if so. It also means we
>>> need to update the SDM because some of the text is wrong.
>> It needs to mention the WRUSS case.
> Ugh. The documentation for this is not pretty. But, I guess this is
> not fundamentally different from access to U=1 pages when SMAP is in
> place and we've set EFLAGS.AC=1.
I was wrong and misread the docs. We do not get X86_PF_USER set when
EFLAGS.AC=1.
But, we *do* get X86_PF_USER (otherwise defined to be set when in ring3)
when running in ring0 with the WRUSS instruction and some other various
shadow-stack-access-related things. I'm sure folks had a good reason
for this architecture, but it is a pretty fundamentally *new*
architecture that we have to account for.
This new architecture is also not spelled out or accounted for in the
SDM as of yet. It's only called out here as far as I know:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Which reminds me: Yu-cheng, do you have a link to the docs anywhere in
your set? If not, you really should.
^permalinkrawreply [flat|nested] 123+ messages in thread

*Re: [RFC PATCH v2 18/27] x86/cet/shstk: Introduce WRUSS instruction
2018-07-13 1:50 ` Dave Hansen@ 2018-07-13 2:21 ` Andy Lutomirski
2018-07-13 4:16 ` Dave Hansen0 siblings, 1 reply; 123+ messages in thread
From: Andy Lutomirski @ 2018-07-13 2:21 UTC (permalink / raw)
To: Dave Hansen
Cc: Dave Hansen, Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
linux-api, Arnd Bergmann, Balbir Singh, Cyrill Gorcunov,
Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
Peter Zijlstra, Ravi V. Shankar, Vedvyas Shanbhogue
> On Jul 12, 2018, at 6:50 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 07/12/2018 04:49 PM, Dave Hansen wrote:
>>>> That seems like something we need to call out if so. It also means we
>>>> need to update the SDM because some of the text is wrong.
>>> It needs to mention the WRUSS case.
>> Ugh. The documentation for this is not pretty. But, I guess this is
>> not fundamentally different from access to U=1 pages when SMAP is in
>> place and we've set EFLAGS.AC=1.
>
> I was wrong and misread the docs. We do not get X86_PF_USER set when
> EFLAGS.AC=1.
>
> But, we *do* get X86_PF_USER (otherwise defined to be set when in ring3)
> when running in ring0 with the WRUSS instruction and some other various
> shadow-stack-access-related things. I'm sure folks had a good reason
> for this architecture, but it is a pretty fundamentally *new*
> architecture that we have to account for.
I think it makes (some) sense. The USER bit is set for a page fault that was done with user privilege. So a descriptor table fault at CPL 3 has USER clear (regardless of the cause of the fault) and WRUSS has USER set.
>
> This new architecture is also not spelled out or accounted for in the
> SDM as of yet. It's only called out here as far as I know:
> https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
>
> Which reminds me: Yu-cheng, do you have a link to the docs anywhere in
> your set? If not, you really should.
I am tempted to suggest that the whole series not be merged until there are actual docs. It’s not a fantastic precedent.
^permalinkrawreply [flat|nested] 123+ messages in thread

*Re: [RFC PATCH v2 16/27] mm: Modify can_follow_write_pte/pmd for shadow stack
2018-07-18 23:10 ` Yu-cheng Yu@ 2018-07-19 0:06 ` Dave Hansen
2018-07-19 17:06 ` Yu-cheng Yu0 siblings, 1 reply; 123+ messages in thread
From: Dave Hansen @ 2018-07-19 0:06 UTC (permalink / raw)
To: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
Arnd Bergmann, Andy Lutomirski, Balbir Singh, Cyrill Gorcunov,
Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
Peter Zijlstra, Ravi V. Shankar, Vedvyas Shanbhogue
>>> -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
>>> +static inline bool can_follow_write(pte_t pte, unsigned int flags,
>>> + struct vm_area_struct *vma)
>>> {
>>> - return pte_write(pte) ||
>>> - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
>>> + if (!is_shstk_mapping(vma->vm_flags)) {
>>> + if (pte_write(pte))
>>> + return true;
>> Let me see if I can say this another way.
>>
>> The bigger issue is that these patches change the semantics of
>> pte_write(). Before these patches, it meant that you *MUST* have this
>> bit set to write to the page controlled by the PTE. Now, it means: you
>> can write if this bit is set *OR* the shadowstack bit combination is set.
>
> Here, we only figure out (1) if the page is pointed by a writable PTE; or
> (2) if the page is pointed by a RO PTE (data or SHSTK) and it has been
> copied and it still exists. We are not trying to
> determine if the
> SHSTK PTE is writable (we know it is not).
Please think about the big picture. I'm not just talking about this
patch, but about every use of pte_write() in the kernel.
>> That's the fundamental problem. We need some code in the kernel that
>> logically represents the concept of "is this PTE a shadowstack PTE or a
>> PTE with the write bit set", and we will call that pte_write(), or maybe
>> pte_writable().
>>
>> You *have* to somehow rectify this situation. We can absolutely no
>> leave pte_write() in its current, ambiguous state where it has no real
>> meaning or where it is used to mean _both_ things depending on context.
>
> True, the processor can always write to a page through a shadow stack
> PTE, but it must do that with a CALL instruction. Can we define a
> write operation as: MOV r1, *(r2). Then we don't have any doubt on
> pte_write() any more.
No, we can't just move the target. :)
You can define it this way, but then you also need to go to every spot
in the kernel that calls pte_write() (and _PAGE_RW in fact) and audit it
to ensure it means "mov ..." and not push.
^permalinkrawreply [flat|nested] 123+ messages in thread