On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> (Multi-Key Total Memory Encryption)
I think that MKTME is a horrible name, and doesn't appear to accurately
describe what it does either. Specifically the 'total' seems out of
place, it doesn't require all memory to be encrypted.

On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote:
> On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> > (Multi-Key Total Memory Encryption)
>
> I think that MKTME is a horrible name, and doesn't appear to accurately
> describe what it does either. Specifically the 'total' seems out of
> place, it doesn't require all memory to be encrypted.
MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with
CPU-generated key. MKTME allows to use other keys or disable encryption
for a page.
But, yes, name is not good.
--
Kirill A. Shutemov

On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield
<alison.schofield@intel.com> wrote:
>
> Hi Thomas, David,
>
> Here is an updated RFC on the API's to support MKTME.
> (Multi-Key Total Memory Encryption)
>
> This RFC presents the 2 API additions to support the creation and
> usage of memory encryption keys:
> 1) Kernel Key Service type "mktme"
> 2) System call encrypt_mprotect()
>
> This patchset is built upon Kirill Shutemov's work for the core MKTME
> support.
>
> David: Please let me know if the changes made, based on your review,
> are reasonable. I don't think that the new changes touch key service
> specific areas (much).
>
> Thomas: Please provide feedback on encrypt_mprotect(). If not a
> review, then a direction check would be helpful.
>
I'm not Thomas, but I think it's the wrong direction. As it stands,
encrypt_mprotect() is an incomplete version of mprotect() (since it's
missing the protection key support), and it's also functionally just
MADV_DONTNEED. In other words, the sole user-visible effect appears
to be that the existing pages are blown away. The fact that it
changes the key in use doesn't seem terribly useful, since it's
anonymous memory, and the most secure choice is to use CPU-managed
keying, which appears to be the default anyway on TME systems. It
also has totally unclear semantics WRT swap, and, off the top of my
head, it looks like it may have serious cache-coherency issues and
like swapping the pages might corrupt them, both because there are no
flushes and because the direct-map alias looks like it will use the
default key and therefore appear to contain the wrong data.
I would propose a very different direction: don't try to support MKTME
at all for anonymous memory, and instead figure out the important use
cases and support them directly. The use cases that I can think of
off the top of my head are:
1. pmem. This should probably use a very different API.
2. Some kind of VM hardening, where a VM's memory can be protected a
little tiny bit from the main kernel. But I don't see why this is any
better than XPO (eXclusive Page-frame Ownership), which brings to
mind:
The main implementation concern I have with this patch set is cache
coherency and handling of the direct map. Unless I missed something,
you're not doing anything about the direct map, which means that you
have RW aliases of the same memory with different keys. For use case
#2, this probably means that you need to either get rid of the direct
map and make get_user_pages() fail, or you need to change the key on
the direct map as well, probably using the pageattr.c code.
As for caching, As far as I can tell from reading the preliminary
docs, Intel's MKTME, much like AMD's SME, is basically invisible to
the hardware cache coherency mechanism. So, if you modify a physical
address with one key (or SME-enable bit), and you read it with
another, you get garbage unless you flush. And, if you modify memory
with one key then remap it with a different key without flushing in
the mean time, you risk corruption. And, what's worse, if I'm reading
between the lines in the docs correctly, if you use PCONFIG to change
a key, you may need to do a bunch of cache flushing to ensure you get
reasonable effects. (If you have dirty cache lines for some (PA, key)
and you PCONFIG to change the underlying key, you get different
results depending on whether the writeback happens before or after the
package doing the writeback notices the PCONFIG.)
Finally, If you're going to teach the kernel how to have some user
pages that aren't in the direct map, you've essentially done XPO,
which is nifty but expensive. And I think that doing this gets you
essentially all the benefit of MKTME for the non-pmem use case. Why
exactly would any software want to use anything other than a
CPU-managed key for anything other than pmem?
--Andy

On Tue, Dec 4, 2018 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield
> <alison.schofield@intel.com> wrote:
> >
> Finally, If you're going to teach the kernel how to have some user
> pages that aren't in the direct map, you've essentially done XPO,
> which is nifty but expensive. And I think that doing this gets you
> essentially all the benefit of MKTME for the non-pmem use case. Why
> exactly would any software want to use anything other than a
> CPU-managed key for anything other than pmem?
>
Let me say this less abstractly. Here's a somewhat concrete actual
proposal. Make a new memfd_create() flag like MEMFD_ISOLATED. The
semantics are that the underlying pages are made not-present in the
direct map when they're allocated (which is hideously slow, but so be
it), and that anything that tries to get_user_pages() the resulting
pages fails. And then make sure we have all the required APIs so that
QEMU can still map this stuff into a VM.
If there is indeed a situation in which MKTME-ifying the memory adds
some value, then we can consider doing that.
And maybe we get fancy and encrypt this memory when it's swapped, but
maybe we should just encrypt everything when it's swapped.

On 12/4/18 12:00 PM, Andy Lutomirski wrote:
> On Tue, Dec 4, 2018 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote:
>> On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield <alison.schofield@intel.com> wrote:
>> Finally, If you're going to teach the kernel how to have some user
>> pages that aren't in the direct map, you've essentially done XPO,
>> which is nifty but expensive. And I think that doing this gets you
>> essentially all the benefit of MKTME for the non-pmem use case. Why
>> exactly would any software want to use anything other than a
>> CPU-managed key for anything other than pmem?
>
> Let me say this less abstractly. Here's a somewhat concrete actual
> proposal. Make a new memfd_create() flag like MEMFD_ISOLATED. The
> semantics are that the underlying pages are made not-present in the
> direct map when they're allocated (which is hideously slow, but so be
> it), and that anything that tries to get_user_pages() the resulting
> pages fails. And then make sure we have all the required APIs so that
> QEMU can still map this stuff into a VM.
I think we need get_user_pages(). We want direct I/O to work, *and* we
really want direct device assignment into VMs.
> And maybe we get fancy and encrypt this memory when it's swapped, but
> maybe we should just encrypt everything when it's swapped.
We decided long ago (and this should be in the patches somewhere) that
we wouldn't force memory to be encrypted in swap. We would just
recommend it in the documentation as a best practice, especially when
using MKTME.
We can walk that back, of course, but that's what we're doing at the moment.

On Tue, Dec 04, 2018 at 09:30:20PM -0800, Alison Schofield wrote:
> On Tue, Dec 04, 2018 at 10:10:44AM +0100, Peter Zijlstra wrote:
> > > + * Encrypted mprotect is only supported on anonymous mappings.
> > > + * All VMA's in the requested range must be anonymous. If this
> > > + * test fails on any single VMA, the entire mprotect request fails.
> > > + */
> > > +bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
> >
> > That's a 'weird' interface and cannot do what the comment says it should
> > do.
>
> More please? With MKTME, only anonymous memory supports encryption.
> Is it the naming that's weird, or you don't see it doing what it says?
It's weird because you don't fully speficy the range -- ie. it cannot
verify the vma argument. It is also weird because the start and end are
not of the same type -- or rather, there is no start at all.
So while the comment talks about a range, there is not in fact a range
(only the implied @start is somewhere inside @vma). The comment also
states all vmas in the range, but again, because of a lack of range
specification it cannot verify this statement.
Now, I don't necessarily object to the function and its implementation,
but that comment is just plain misleading.

On Wed, Dec 05, 2018 at 10:10:29AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 04, 2018 at 09:43:53PM -0800, Alison Schofield wrote:
> > On Tue, Dec 04, 2018 at 10:21:45AM +0100, Peter Zijlstra wrote:
> > > On Mon, Dec 03, 2018 at 11:39:58PM -0800, Alison Schofield wrote:
> >
> > > How is that serialized and kept relevant in the face of hotplug?
> > mktme_leadcpus is updated on hotplug startup and teardowns.
>
> Not in this patch it is not. That is added in a subsequent patch, which
> means that during bisection hotplug is utterly wrecked if you happen to
> land between these patches, that is bad.
>
The Key Service support is split between 4 main patches (10-13), but
the dependencies go further back in the patchset.
If the bisect need outweighs any benefit from reviewing in pieces,
then these patches can be squashed to a single patch:
keys/mktme: Add the MKTME Key Service type for memory encryption
keys/mktme: Program memory encryption keys on a system wide basis
keys/mktme: Save MKTME data if kernel cmdline parameter allows
keys/mktme: Support CPU Hotplug for MKTME keys
Am I interpreting your point correctly?
Thanks,
Alison

> On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
I realize you’re writing code to expose hardware behavior, but I’m not sure this
really makes sense in this context.
> .
> +
> +Usage
> +-----
> + When using the Kernel Key Service to request an *mktme* key,
> + specify the *payload* as follows:
> +
> + type=
> + *user* User will supply the encryption key data. Use this
> + type to directly program a hardware encryption key.
> +
I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory. Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.
Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem? I imagine it would look quite a bit like dm-crypt. Advanced pmem filesystems could plausibly use different keys for different files, I suppose.
If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?
> + *cpu* User requests a CPU generated encryption key.
Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.
> + The CPU generates and assigns an ephemeral key.
> +
> + *clear* User requests that a hardware encryption key be
> + cleared. This will clear the encryption key from
> + the hardware. On execution this hardware key gets
> + TME behavior.
> +
Why is this a key type? Shouldn’t the API to select a key just have an option to ask for no key to be used?
> + *no-encrypt*
> + User requests that hardware does not encrypt
> + memory when this key is in use.
Same as above. If there’s a performance benefit, then there could be a way to ask for cleartext memory. Similarly, some pmem users may want a way to keep their pmem unencrypted.
—Andy

On Wed, Dec 05, 2018 at 10:11:18AM -0800, Andy Lutomirski wrote:
>
>
> > On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
>
> I realize you’re writing code to expose hardware behavior, but I’m not sure this
> really makes sense in this context.
Your observation is accurate. The Usage defined here is very closely
aligned to the Intel MKTME Architecture spec. That's a starting point,
but not the ending point. We need to implement the feature set that
makes sense. More below...
> > +
> > + type=
> > + *user* User will supply the encryption key data. Use this
> > + type to directly program a hardware encryption key.
> > +
>
> I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory. Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.
>
>
> Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem? I imagine it would look quite a bit like dm-crypt. Advanced pmem filesystems could plausibly use different keys for different files, I suppose.
>
> If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?
Dropping 'user' type removes a great deal of complexity.
Let me follow up in 2 ways:
1) Find out when MKTME support for pmem is required.
2) Go back to the the requirements and get the justification for user
type.
>
> > + *cpu* User requests a CPU generated encryption key.
>
> Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.
*cpu* is the RANDOM key generated by the cpu. If there were no other
options, then this would be default, and go away.
> > + *clear* User requests that a hardware encryption key be
> > + cleared. This will clear the encryption key from
> > + the hardware. On execution this hardware key gets
> > + TME behavior.
> > +
>
> Why is this a key type? Shouldn’t the API to select a key just have an option to ask for no key to be used?
The *clear* key has been requested in order to clear/erase the users
key data that has been programmed into a hardware slot. User does not
want to leave a slot programmed with their encryption data when they
are done with it.
> > + *no-encrypt*
> > + User requests that hardware does not encrypt
> > + memory when this key is in use.
>
> Same as above. If there’s a performance benefit, then there could be a way to ask for cleartext memory. Similarly, some pmem users may want a way to keep their pmem unencrypted.
So, this is the way to ask for cleartext memory.
The entire system will be encrypted with the system wide TME Key.
A subset of that will be protected with MKTME Keys.
If user wants, no encrypt, this *no-encrypt* is the way to do it.
Alison
>
> —Andy

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> Hi Thomas, David,
>
> Here is an updated RFC on the API's to support MKTME.
> (Multi-Key Total Memory Encryption)
>
> This RFC presents the 2 API additions to support the creation and
> usage of memory encryption keys:
> 1) Kernel Key Service type "mktme"
> 2) System call encrypt_mprotect()
>
> This patchset is built upon Kirill Shutemov's work for the core MKTME
> support.
Please, explain what MKTME is right here.
No references, no explanations... Even with a reference, a short
summary would be really nice to have.
/Jarkko

On Tue, 2018-12-04 at 12:46 +0300, Kirill A. Shutemov wrote:
> On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote:
> > On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> > > (Multi-Key Total Memory Encryption)
> >
> > I think that MKTME is a horrible name, and doesn't appear to accurately
> > describe what it does either. Specifically the 'total' seems out of
> > place, it doesn't require all memory to be encrypted.
>
> MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with
> CPU-generated key. MKTME allows to use other keys or disable encryption
> for a page.
When you say "disable encryption to a page" does the encryption get
actually disabled or does the CPU just decrypt it transparently i.e.
what happens physically?
> But, yes, name is not good.
/Jarkko

On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> I'm not Thomas, but I think it's the wrong direction. As it stands,
> encrypt_mprotect() is an incomplete version of mprotect() (since it's
> missing the protection key support), and it's also functionally just
> MADV_DONTNEED. In other words, the sole user-visible effect appears
> to be that the existing pages are blown away. The fact that it
> changes the key in use doesn't seem terribly useful, since it's
> anonymous memory, and the most secure choice is to use CPU-managed
> keying, which appears to be the default anyway on TME systems. It
> also has totally unclear semantics WRT swap, and, off the top of my
> head, it looks like it may have serious cache-coherency issues and
> like swapping the pages might corrupt them, both because there are no
> flushes and because the direct-map alias looks like it will use the
> default key and therefore appear to contain the wrong data.
>
> I would propose a very different direction: don't try to support MKTME
> at all for anonymous memory, and instead figure out the important use
> cases and support them directly. The use cases that I can think of
> off the top of my head are:
>
> 1. pmem. This should probably use a very different API.
>
> 2. Some kind of VM hardening, where a VM's memory can be protected a
> little tiny bit from the main kernel. But I don't see why this is any
> better than XPO (eXclusive Page-frame Ownership), which brings to
> mind:
What is the threat model anyway for AMD and Intel technologies?
For me it looks like that you can read, write and even replay
encrypted pages both in SME and TME.
/Jarkko

>> On Dec 5, 2018, at 11:22 AM, Alison Schofield <alison.schofield@intel.com> wrote:
>>
>> On Wed, Dec 05, 2018 at 10:11:18AM -0800, Andy Lutomirski wrote:
>>
>>
>>> On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield@intel.com> wrote:
>>
>> I realize you’re writing code to expose hardware behavior, but I’m not sure this
>> really makes sense in this context.
>
> Your observation is accurate. The Usage defined here is very closely
> aligned to the Intel MKTME Architecture spec. That's a starting point,
> but not the ending point. We need to implement the feature set that
> makes sense. More below...
>
>>> +
>>> + type=
>>> + *user* User will supply the encryption key data. Use this
>>> + type to directly program a hardware encryption key.
>>> +
>>
>> I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory. Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.
>>
>>
>> Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem? I imagine it would look quite a bit like dm-crypt. Advanced pmem filesystems could plausibly use different keys for different files, I suppose.
>>
>> If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?
>
> Dropping 'user' type removes a great deal of complexity.
>
> Let me follow up in 2 ways:
> 1) Find out when MKTME support for pmem is required.
> 2) Go back to the the requirements and get the justification for user
> type.
>
>>
>>> + *cpu* User requests a CPU generated encryption key.
>>
>> Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.
> *cpu* is the RANDOM key generated by the cpu. If there were no other
> options, then this would be default, and go away.
>
>>> + *clear* User requests that a hardware encryption key be
>>> + cleared. This will clear the encryption key from
>>> + the hardware. On execution this hardware key gets
>>> + TME behavior.
>>> +
>>
>> Why is this a key type? Shouldn’t the API to select a key just have an option to ask for no key to be used?
>
> The *clear* key has been requested in order to clear/erase the users
> key data that has been programmed into a hardware slot. User does not
> want to leave a slot programmed with their encryption data when they
> are done with it.
Can’t you just clear the key when the key is deleted by the user?
Asking the user to allocate a *new* key and hope that it somehow ends
up in the same spot seems like a poor design, especially if future
hardware gains support for key slot virtualization in some way that
makes the slot allocation more dynamic.
>
>>> + *no-encrypt*
>>> + User requests that hardware does not encrypt
>>> + memory when this key is in use.
>>
>> Same as above. If there’s a performance benefit, then there could be a way to ask for cleartext memory. Similarly, some pmem users may want a way to keep their pmem unencrypted.
>
> So, this is the way to ask for cleartext memory.
> The entire system will be encrypted with the system wide TME Key.
> A subset of that will be protected with MKTME Keys.
> If user wants, no encrypt, this *no-encrypt* is the way to do it.
>
Understood. I’m saying that having a *key* (in the add_key sense) for
it seems unnecessary. Whatever the final API for controlling the use
of keys, adding an option to ask for clear text seems reasonable.
This actually seems more useful for anonymous memory than the
cpu-generates keys are IMO.
I do think that, before you invest too much time in perfecting the
series with the current design, you should identify the use cases,
make sure the use cases are valid, and figure out whether your API
design is appropriate. After considerable head-scratching, I haven’t
thought of a reason that explicit CPU generated keys are any better
than the default TME key, at least in the absence of additional
hardware support for locking down what code can use what key. The
sole exception is that a key can be removed, which is probably faster
than directly zeroing large amounts of data.
I understand that it would be very nice to say "hey, cloud customer,
your VM has all its memory encrypted with a key that is unique to your
VM", but that seems to be more or less just a platitude with no actual
effect. Anyone who snoops the memory bus or steals a DIMM learns
nothing unless they also take control of the CPU and can replay all
the data into the CPU. On the other hand, anyone who can get the CPU
to read from a given physical address (which seems like the most
likely threat) can just get the CPU to decrypt any tenant's data. So,
for example, if someone manages to write a couple of words to the EPT
for one VM, then they can easily read another VM's data, MKTME or no
MKTME, because the memory controller has no clue which VM initiated
the access.
I suppose there's some smallish value in rotating the key every now
and then to make old data non-replayable, but an attack that
compromises the memory bus and only later compromises the CPU is a
strange threat model.

On 12/4/18 11:19 AM, Andy Lutomirski wrote:
> I'm not Thomas, but I think it's the wrong direction. As it stands,
> encrypt_mprotect() is an incomplete version of mprotect() (since it's
> missing the protection key support),
I thought about this when I added mprotect_pkey(). We start with:
mprotect(addr, len, prot);
then
mprotect_pkey(addr, len, prot);
then
mprotect_pkey_encrypt(addr, len, prot, key);
That doesn't scale because we eventually have
mprotect_and_a_history_of_mm_features(). :)
What I was hoping to see was them do this (apologies for the horrible
indentation:
ptr = mmap(..., PROT_NONE);
mprotect_pkey( addr, len, PROT_NONE, pkey);
mprotect_encrypt(addr, len, PROT_NONE, keyid);
mprotect( addr, len, real_prot);
The point is that you *can* stack these things and don't have to have an
mprotect_kitchen_sink() if you use PROT_NONE for intermediate
permissions during setup.
> and it's also functionally just MADV_DONTNEED. In other words, the
> sole user-visible effect appears to be that the existing pages are
> blown away. The fact that it changes the key in use doesn't seem
> terribly useful, since it's anonymous memory,
It's functionally MADV_DONTNEED, plus a future promise that your writes
will never show up as plaintext on the DIMM.
We also haven't settled on the file-backed properties. For file-backed,
my hope was that you could do:
ptr = mmap(fd, size, prot);
printf("ciphertext: %x\n", *ptr);
mprotect_encrypt(ptr, len, prot, keyid);
printf("plaintext: %x\n", *ptr);
> and the most secure choice is to use CPU-managed keying, which
> appears to be the default anyway on TME systems. It also has totally
> unclear semantics WRT swap, and, off the top of my head, it looks
> like it may have serious cache-coherency issues and like swapping the
> pages might corrupt them, both because there are no flushes and
> because the direct-map alias looks like it will use the default key
> and therefore appear to contain the wrong data.
I think we fleshed this out on IRC a bit, but the other part of the
implementation is described here: https://lwn.net/Articles/758313/, and
contains a direct map per keyid. When you do phys_to_virt() and
friends, you get the correct, decrypted view direct map which is
appropriate for the physical page. And, yes, this has very
consequential security implications.
> I would propose a very different direction: don't try to support MKTME
> at all for anonymous memory, and instead figure out the important use
> cases and support them directly. The use cases that I can think of
> off the top of my head are:
>
> 1. pmem. This should probably use a very different API.
>
> 2. Some kind of VM hardening, where a VM's memory can be protected a
> little tiny bit from the main kernel. But I don't see why this is any
> better than XPO (eXclusive Page-frame Ownership), which brings to
> mind:
The XPO approach is "fun", and would certainly be a way to keep the
direct map from being exploited to get access to plain-text mappings of
ciphertext.
But, it also has massive performance implications and we didn't quite
want to go there quite yet.
> The main implementation concern I have with this patch set is cache
> coherency and handling of the direct map. Unless I missed something,
> you're not doing anything about the direct map, which means that you
> have RW aliases of the same memory with different keys. For use case
> #2, this probably means that you need to either get rid of the direct
> map and make get_user_pages() fail, or you need to change the key on
> the direct map as well, probably using the pageattr.c code.
The current, public hardware spec has a description of what's required
to maintain cache coherency. Basically, you can keep as many mappings
of a physical page as you want, but only write to one mapping at a time,
and clflush the old one when you want to write to a new one.
> As for caching, As far as I can tell from reading the preliminary
> docs, Intel's MKTME, much like AMD's SME, is basically invisible to
> the hardware cache coherency mechanism. So, if you modify a physical
> address with one key (or SME-enable bit), and you read it with
> another, you get garbage unless you flush. And, if you modify memory
> with one key then remap it with a different key without flushing in
> the mean time, you risk corruption.
Yes, all true (at least with respect to Intel's implementation).
> And, what's worse, if I'm reading
> between the lines in the docs correctly, if you use PCONFIG to change
> a key, you may need to do a bunch of cache flushing to ensure you get
> reasonable effects. (If you have dirty cache lines for some (PA, key)
> and you PCONFIG to change the underlying key, you get different
> results depending on whether the writeback happens before or after the
> package doing the writeback notices the PCONFIG.)
We're not going to allow a key to be PCONFIG'd while there are any
physical pages still associated with it. There are per-VMA refcounts
tied back to the keyid slots, IIRC. So, before PCONFIG can happen, we
just need to make sure that all the VMAs are gone, all the pages are
freed, and all dirty cachelines have been clflushed.
This is where get_user_pages() is our mortal enemy, though. I hope we
got that right. Kirill/Alison, we should chat about this one. :)
> Finally, If you're going to teach the kernel how to have some user
> pages that aren't in the direct map, you've essentially done XPO,
> which is nifty but expensive. And I think that doing this gets you
> essentially all the benefit of MKTME for the non-pmem use case. Why
> exactly would any software want to use anything other than a
> CPU-managed key for anything other than pmem?
It is handy, for one, to let you "cluster" key usage. If you have 5
Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
Coke one using the same key, you can boil it down to only 2 hardware
keyid slots that get used, and do this transparently.
But, I think what you're implying is that the security properties of
user-supplied keys can only be *worse* than using CPU-generated keys
(assuming the CPU does a good job generating it). So, why bother
allowing user-specified keys in the first place?
It's a good question and I don't have a solid answer for why folks want
this. I'll find out.

On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 12/4/18 11:19 AM, Andy Lutomirski wrote:
> > I'm not Thomas, but I think it's the wrong direction. As it stands,
> > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > missing the protection key support),
>
> I thought about this when I added mprotect_pkey(). We start with:
>
> mprotect(addr, len, prot);
>
> then
>
> mprotect_pkey(addr, len, prot);
>
> then
>
> mprotect_pkey_encrypt(addr, len, prot, key);
>
> That doesn't scale because we eventually have
> mprotect_and_a_history_of_mm_features(). :)
>
> What I was hoping to see was them do this (apologies for the horrible
> indentation:
>
> ptr = mmap(..., PROT_NONE);
> mprotect_pkey( addr, len, PROT_NONE, pkey);
> mprotect_encrypt(addr, len, PROT_NONE, keyid);
> mprotect( addr, len, real_prot);
>
> The point is that you *can* stack these things and don't have to have an
> mprotect_kitchen_sink() if you use PROT_NONE for intermediate
> permissions during setup.
Sure, but then why call it mprotect at all? How about:
mmap(..., PROT_NONE);
mencrypt(..., keyid);
mprotect_pkey(...);
But wouldn't this be much nicer:
int fd = memfd_create(...);
memfd_set_tme_key(fd, keyid); /* fails if len != 0 */
mmap(fd, ...);
>
> > and it's also functionally just MADV_DONTNEED. In other words, the
> > sole user-visible effect appears to be that the existing pages are
> > blown away. The fact that it changes the key in use doesn't seem
> > terribly useful, since it's anonymous memory,
>
> It's functionally MADV_DONTNEED, plus a future promise that your writes
> will never show up as plaintext on the DIMM.
But that's mostly vacuous. If I read the docs right, MKTME systems
also support TME, so you *already* have that promise, unless the
firmware totally blew it. If we want a boot option to have the kernel
use MKTME to forcibly encrypt everything regardless of what the TME
MSRs say, I'd be entirely on board. Heck, the implementation would be
quite simple because we mostly reuse the SME code.
>
> We also haven't settled on the file-backed properties. For file-backed,
> my hope was that you could do:
>
> ptr = mmap(fd, size, prot);
> printf("ciphertext: %x\n", *ptr);
> mprotect_encrypt(ptr, len, prot, keyid);
> printf("plaintext: %x\n", *ptr);
Why would you ever want the plaintext? Also, how does this work on a
normal fs, where relocation of the file would cause the ciphertext to
get lost? It really seems to be that it should look more like
dm-crypt where you encrypt a filesystem. Maybe you'd just configure
the pmem device to be encrypted before you mount it, or you'd get a
new pmem-mktme device node instead. This would also avoid some nasty
multiple-copies-of-the-direct-map issue, since you'd only ever have
one of them mapped.
>
> > The main implementation concern I have with this patch set is cache
> > coherency and handling of the direct map. Unless I missed something,
> > you're not doing anything about the direct map, which means that you
> > have RW aliases of the same memory with different keys. For use case
> > #2, this probably means that you need to either get rid of the direct
> > map and make get_user_pages() fail, or you need to change the key on
> > the direct map as well, probably using the pageattr.c code.
>
> The current, public hardware spec has a description of what's required
> to maintain cache coherency. Basically, you can keep as many mappings
> of a physical page as you want, but only write to one mapping at a time,
> and clflush the old one when you want to write to a new one.
Surely you at least have to clflush the old mapping and then the new
mapping, since the new mapping could have been speculatively read.
> > Finally, If you're going to teach the kernel how to have some user
> > pages that aren't in the direct map, you've essentially done XPO,
> > which is nifty but expensive. And I think that doing this gets you
> > essentially all the benefit of MKTME for the non-pmem use case. Why
> > exactly would any software want to use anything other than a
> > CPU-managed key for anything other than pmem?
>
> It is handy, for one, to let you "cluster" key usage. If you have 5
> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
> Coke one using the same key, you can boil it down to only 2 hardware
> keyid slots that get used, and do this transparently.
I understand this from a marketing perspective but not a security
perspective. Say I'm Coke and you've sold me some VMs that are
"encrypted with a Coke-specific key and no other VMs get to use that
key." I can't think of *any* not-exceedingly-contrived attack in
which this makes the slightest difference. If Pepsi tries to attack
Coke without MKTME, then they'll either need to get the hypervisor to
leak Coke's data through the direct map or they'll have to find some
way to corrupt a page table or use something like L1TF to read from a
physical address Coke owns. With MKTME, if they can read through the
host direct map, then they'll get Coke's cleartext, and if they can
corrupt a page table or use L1TF to read from your memory, they'll get
Coke's cleartext.
TME itself provides a ton of protection -- you can't just barge into
the datacenter, refrigerate the DIMMs, walk away with them, and read
off everyone's data.
Am I missing something?
>
> But, I think what you're implying is that the security properties of
> user-supplied keys can only be *worse* than using CPU-generated keys
> (assuming the CPU does a good job generating it). So, why bother
> allowing user-specified keys in the first place?
That too :)

[ only responding to the pmem side of things... ]
On Wed, Dec 5, 2018 at 5:09 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote:
[..]
> > We also haven't settled on the file-backed properties. For file-backed,
> > my hope was that you could do:
> >
> > ptr = mmap(fd, size, prot);
> > printf("ciphertext: %x\n", *ptr);
> > mprotect_encrypt(ptr, len, prot, keyid);
> > printf("plaintext: %x\n", *ptr);
>
> Why would you ever want the plaintext? Also, how does this work on a
> normal fs, where relocation of the file would cause the ciphertext to
> get lost? It really seems to be that it should look more like
> dm-crypt where you encrypt a filesystem. Maybe you'd just configure
> the pmem device to be encrypted before you mount it, or you'd get a
> new pmem-mktme device node instead. This would also avoid some nasty
> multiple-copies-of-the-direct-map issue, since you'd only ever have
> one of them mapped.
Yes, this is really the only way it can work. Otherwise you need to
teach the filesystem that "these blocks can't move without the key
because encryption", and have an fs-feature flag to say "you can't
mount this legacy / encryption unaware filesystem from an older kernel
because we're not sure you'll move something and break the
encryption".
So pmem namespaces (volumes) would be encrypted providing something
similar to dm-crypt, although we're looking at following the lead of
the fscrypt key management scheme.

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> Implement memory encryption with a new system call that is an
> extension of the legacy mprotect() system call.
>
> In encrypt_mprotect the caller must pass a handle to a previously
> allocated and programmed encryption key. Validate the key and store
> the keyid bits in the vm_page_prot for each VMA in the protection
> range.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Why you don't use that NO_KEY in this patch?
/Jarkko

On Thu, 2018-12-06 at 00:51 -0800, Jarkko Sakkinen wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > MKTME (Multi-Key Total Memory Encryption) is a technology that allows
> > transparent memory encryption in upcoming Intel platforms. MKTME will
> > support mulitple encryption domains, each having their own key. The main
> > use case for the feature is virtual machine isolation. The API needs the
> > flexibility to work for a wide range of uses.
>
> Some, maybe brutal, honesty (apologies)...
>
> Have never really got the grip why either SME or TME would make
> isolation any better. If you can break into hypervisor, you'll
> have these tools availabe:
>
> 1. Read page (in encrypted form).
> 2. Write page (for example replay as pages are not versioned).
>
> with all the side-channel possibilities of course since you can
> control the VMs (in which core they execute etc.).
>
> I've seen now SME presentation three times and it always leaves
> me an empty feeling. This feels the same same.
I.e. need to tell very explicitly the scenario where this will
help. Not saying that this should resolve everything but it must
resolve something.
/Jarkko

On 12/6/18 3:22 AM, Kirill A. Shutemov wrote:
>> When you say "disable encryption to a page" does the encryption get
>> actually disabled or does the CPU just decrypt it transparently i.e.
>> what happens physically?
> Yes, it gets disabled. Physically. It overrides TME encryption.
I know MKTME itself has a runtime overhead and we expect it to have a
performance impact in the low single digits. Does TME have that
overhead? Presumably MKTME plus no-encryption is not expected to have
the overhead.
We should probably mention that in the changelogs too.

On 12/6/18 12:51 AM, Sakkinen, Jarkko wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
>> MKTME (Multi-Key Total Memory Encryption) is a technology that allows
>> transparent memory encryption in upcoming Intel platforms. MKTME will
>> support mulitple encryption domains, each having their own key. The main
>> use case for the feature is virtual machine isolation. The API needs the
>> flexibility to work for a wide range of uses.
> Some, maybe brutal, honesty (apologies)...
>
> Have never really got the grip why either SME or TME would make
> isolation any better. If you can break into hypervisor, you'll
> have these tools availabe:
For systems using MKTME, the hypervisor is within the "trust boundary".
From what I've read, it is a bit _more_ trusted than with AMD's scheme.
But, yes, if you can mount a successful arbitrary code execution attack
against the MKTME hypervisor, you can defeat MKTME's protections. If
the kernel creates non-encrypted mappings of memory that's being
encrypted with MKTME, an arbitrary read primitive could also be a very
valuable in defeating MKTME's protections. That's why Andy is proposing
doing something like eXclusive-Page-Frame-Ownership (google XPFO).

On 12/5/18 5:09 PM, Andy Lutomirski wrote:
> On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> What I was hoping to see was them do this (apologies for the horrible
>> indentation:
>>
>> ptr = mmap(..., PROT_NONE);
>> mprotect_pkey( addr, len, PROT_NONE, pkey);
>> mprotect_encrypt(addr, len, PROT_NONE, keyid);
>> mprotect( addr, len, real_prot);
>>
>> The point is that you *can* stack these things and don't have to have an
>> mprotect_kitchen_sink() if you use PROT_NONE for intermediate
>> permissions during setup.
>
> Sure, but then why call it mprotect at all? How about:
>
> mmap(..., PROT_NONE);
> mencrypt(..., keyid);
> mprotect_pkey(...);
That would totally work too. I just like the idea of a family of
mprotect() syscalls that do mprotect() plus one other thing. What
you're proposing is totally equivalent where we have mprotect_pkey()
always being the *last* thing that gets called, plus a family of things
that we expect to get called on something that's probably PROT_NONE.
> But wouldn't this be much nicer:
>
> int fd = memfd_create(...);
> memfd_set_tme_key(fd, keyid); /* fails if len != 0 */
> mmap(fd, ...);
No. :)
One really big advantage with protection keys, or this implementation is
that you don't have to implement an allocator. You can use it with any
old malloc() as long as you own a whole page.
The pages also fundamentally *stay* anonymous in the VM and get all the
goodness that comes with that, like THP.
>>> and it's also functionally just MADV_DONTNEED. In other words, the
>>> sole user-visible effect appears to be that the existing pages are
>>> blown away. The fact that it changes the key in use doesn't seem
>>> terribly useful, since it's anonymous memory,
>>
>> It's functionally MADV_DONTNEED, plus a future promise that your writes
>> will never show up as plaintext on the DIMM.
>
> But that's mostly vacuous. If I read the docs right, MKTME systems
> also support TME, so you *already* have that promise, unless the
> firmware totally blew it. If we want a boot option to have the kernel
> use MKTME to forcibly encrypt everything regardless of what the TME
> MSRs say, I'd be entirely on board. Heck, the implementation would be
> quite simple because we mostly reuse the SME code.
Yeah, that's true. I seem to always forget about the TME case! :)
"It's functionally MADV_DONTNEED, plus a future promise that your writes
will never be written to the DIMM with the TME key."
But, this gets us back to your very good question about what good this
does in the end. What value does _that_ scheme provide over TME? We're
admittedly weak on specific examples there, but I'm working on it.
>>> the direct map as well, probably using the pageattr.c code.
>>
>> The current, public hardware spec has a description of what's required
>> to maintain cache coherency. Basically, you can keep as many mappings
>> of a physical page as you want, but only write to one mapping at a time,
>> and clflush the old one when you want to write to a new one.
>
> Surely you at least have to clflush the old mapping and then the new
> mapping, since the new mapping could have been speculatively read.
Nope. The coherency is "fine" unless you have writeback of an older
cacheline that blows away newer data. CPUs that support MKTME are
guaranteed to never do writeback of the lines that could be established
speculatively or from prefetching.
>>> Finally, If you're going to teach the kernel how to have some user
>>> pages that aren't in the direct map, you've essentially done XPO,
>>> which is nifty but expensive. And I think that doing this gets you
>>> essentially all the benefit of MKTME for the non-pmem use case. Why
>>> exactly would any software want to use anything other than a
>>> CPU-managed key for anything other than pmem?
>>
>> It is handy, for one, to let you "cluster" key usage. If you have 5
>> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
>> Coke one using the same key, you can boil it down to only 2 hardware
>> keyid slots that get used, and do this transparently.
>
> I understand this from a marketing perspective but not a security
> perspective. Say I'm Coke and you've sold me some VMs that are
> "encrypted with a Coke-specific key and no other VMs get to use that
> key." I can't think of *any* not-exceedingly-contrived attack in
> which this makes the slightest difference. If Pepsi tries to attack
> Coke without MKTME, then they'll either need to get the hypervisor to
> leak Coke's data through the direct map or they'll have to find some
> way to corrupt a page table or use something like L1TF to read from a
> physical address Coke owns. With MKTME, if they can read through the
> host direct map, then they'll get Coke's cleartext, and if they can
> corrupt a page table or use L1TF to read from your memory, they'll get
> Coke's cleartext.
The design definitely has the hypervisor in the trust boundary. If the
hypervisor is evil, or if someone evil compromises the hypervisor, MKTME
obviously provides less protection.
I guess the question ends up being if this makes its protections weak
enough that we should not bother merging it in its current form.
I still have the homework assignment to go figure out why folks want the
protections as they stand.

> On Dec 6, 2018, at 7:39 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>>> the direct map as well, probably using the pageattr.c code.
>>>
>>> The current, public hardware spec has a description of what's required
>>> to maintain cache coherency. Basically, you can keep as many mappings
>>> of a physical page as you want, but only write to one mapping at a time,
>>> and clflush the old one when you want to write to a new one.
>>
>> Surely you at least have to clflush the old mapping and then the new
>> mapping, since the new mapping could have been speculatively read.
>
> Nope. The coherency is "fine" unless you have writeback of an older
> cacheline that blows away newer data. CPUs that support MKTME are
> guaranteed to never do writeback of the lines that could be established
> speculatively or from prefetching.
How is that sufficient? Suppose I have some physical page mapped with
keys 1 and 2. #1 is logically live and I write to it. Then I prefetch
or otherwise populate mapping 2 into the cache (in the S state,
presumably). Now I clflush mapping 1 and read 2. It contains garbage
in the cache, but the garbage in the cache is inconsistent with the
garbage in memory. This can’t be a good thing, even if no writeback
occurs.
I suppose the right fix is to clflush the old mapping and then to zero
the new mapping.
>
>>>> Finally, If you're going to teach the kernel how to have some user
>>>> pages that aren't in the direct map, you've essentially done XPO,
>>>> which is nifty but expensive. And I think that doing this gets you
>>>> essentially all the benefit of MKTME for the non-pmem use case. Why
>>>> exactly would any software want to use anything other than a
>>>> CPU-managed key for anything other than pmem?
>>>
>>> It is handy, for one, to let you "cluster" key usage. If you have 5
>>> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each
>>> Coke one using the same key, you can boil it down to only 2 hardware
>>> keyid slots that get used, and do this transparently.
>>
>> I understand this from a marketing perspective but not a security
>> perspective. Say I'm Coke and you've sold me some VMs that are
>> "encrypted with a Coke-specific key and no other VMs get to use that
>> key." I can't think of *any* not-exceedingly-contrived attack in
>> which this makes the slightest difference. If Pepsi tries to attack
>> Coke without MKTME, then they'll either need to get the hypervisor to
>> leak Coke's data through the direct map or they'll have to find some
>> way to corrupt a page table or use something like L1TF to read from a
>> physical address Coke owns. With MKTME, if they can read through the
>> host direct map, then they'll get Coke's cleartext, and if they can
>> corrupt a page table or use L1TF to read from your memory, they'll get
>> Coke's cleartext.
>
> The design definitely has the hypervisor in the trust boundary. If the
> hypervisor is evil, or if someone evil compromises the hypervisor, MKTME
> obviously provides less protection.
>
> I guess the question ends up being if this makes its protections weak
> enough that we should not bother merging it in its current form.
Indeed, but I’d ask another question too: I expect that MKTME is weak
enough that it will be improved, and without seeing the improvement,
it seems quite plausible that using the improvement will require
radically reworking the kernel implementation.
As a straw man, suppose we get a way to say “this key may only be
accessed through such-and-such VPID or by using a special new
restricted facility for the hypervisor to request access”. Now we
have some degree of serious protection, but it doesn’t work, by
design, for anonymous memory. Similarly, something that looks more
like AMD's SEV would be very very awkward to support with anything
like the current API proposal.
>
> I still have the homework assignment to go figure out why folks want the
> protections as they stand.

On 12/6/18 11:10 AM, Andy Lutomirski wrote:
>> On Dec 6, 2018, at 7:39 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>The coherency is "fine" unless you have writeback of an older
>> cacheline that blows away newer data. CPUs that support MKTME are
>> guaranteed to never do writeback of the lines that could be established
>> speculatively or from prefetching.
>
> How is that sufficient? Suppose I have some physical page mapped with
> keys 1 and 2. #1 is logically live and I write to it. Then I prefetch
> or otherwise populate mapping 2 into the cache (in the S state,
> presumably). Now I clflush mapping 1 and read 2. It contains garbage
> in the cache, but the garbage in the cache is inconsistent with the
> garbage in memory. This can’t be a good thing, even if no writeback
> occurs.
>
> I suppose the right fix is to clflush the old mapping and then to zero
> the new mapping.
Yep. Practically, you need to write to the new mapping to give it any
meaning. Those writes effectively blow away any previously cached,
garbage contents.
I think you're right, though, that the cached data might not be
_consistent_ with what is in memory. It feels really dirty, but I can't
think of any problems that it actually causes.

On Thu, 2018-12-06 at 14:22 +0300, Kirill A. Shutemov wrote:
> When you say "disable encryption to a page" does the encryption get
> > actually disabled or does the CPU just decrypt it transparently i.e.
> > what happens physically?
>
> Yes, it gets disabled. Physically. It overrides TME encryption.
OK, thanks for confirmation. BTW, how much is the penalty to keep it
always enabled? Is it something that would not make sense for some
other reasons?
/Jarkko

On Thu, 2018-12-06 at 07:11 -0800, Dave Hansen wrote:
> On 12/6/18 12:51 AM, Sakkinen, Jarkko wrote:
> > On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > > MKTME (Multi-Key Total Memory Encryption) is a technology that allows
> > > transparent memory encryption in upcoming Intel platforms. MKTME will
> > > support mulitple encryption domains, each having their own key. The main
> > > use case for the feature is virtual machine isolation. The API needs the
> > > flexibility to work for a wide range of uses.
> > Some, maybe brutal, honesty (apologies)...
> >
> > Have never really got the grip why either SME or TME would make
> > isolation any better. If you can break into hypervisor, you'll
> > have these tools availabe:
>
> For systems using MKTME, the hypervisor is within the "trust boundary".
> From what I've read, it is a bit _more_ trusted than with AMD's scheme.
>
> But, yes, if you can mount a successful arbitrary code execution attack
> against the MKTME hypervisor, you can defeat MKTME's protections. If
> the kernel creates non-encrypted mappings of memory that's being
> encrypted with MKTME, an arbitrary read primitive could also be a very
> valuable in defeating MKTME's protections. That's why Andy is proposing
> doing something like eXclusive-Page-Frame-Ownership (google XPFO).
Thanks, I was not aware of XPFO but I found a nice ~2 page article about it:
https://lwn.net/Articles/700647/
I think the performance hit is the necessary price to pay (if you want
something more opaque than just the usual "military grade security"). At
minimum, it should be an opt-in feature.
/Jarkko

>
> TME itself provides a ton of protection -- you can't just barge into
> the datacenter, refrigerate the DIMMs, walk away with them, and read
> off everyone's data.
>
> Am I missing something?
I think we can make such assumption in most cases, but I think it's better that we don't make any
assumption at all. For example, the admin of data center (or anyone) who has physical access to
servers may do something malicious. I am not expert but there should be other physical attack
methods besides coldboot attack, if the malicious employee can get physical access to server w/o
being detected.
>
> >
> > But, I think what you're implying is that the security properties of
> > user-supplied keys can only be *worse* than using CPU-generated keys
> > (assuming the CPU does a good job generating it). So, why bother
> > allowing user-specified keys in the first place?
>
> That too :)
I think one usage of user-specified key is for NVDIMM, since CPU key will be gone after machine
reboot, therefore if NVDIMM is encrypted by CPU key we are not able to retrieve it once
shutdown/reboot, etc.
There are some other use cases that already require tenant to send key to CSP. For example, the VM
image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key to
CSP when asking CSP to run that encrypted image. But tenant will need to trust CSP in such case,
which brings us why tenant wants to use his own image at first place (I have to say I myself is not
convinced the value of such use case). I think there are two levels of trustiness involved here: 1)
tenant needs to trust CSP anyway; 2) but CSP needs to convince tenant that CSP can be trusted, ie,
by proving it can prevent potential attack from malicious employee (ie, by raising bar by using
MKTME), etc.
Thanks,
-Kai

On Wed, 2018-12-05 at 22:19 +0000, Sakkinen, Jarkko wrote:
> On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> > I'm not Thomas, but I think it's the wrong direction. As it stands,
> > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > missing the protection key support), and it's also functionally just
> > MADV_DONTNEED. In other words, the sole user-visible effect appears
> > to be that the existing pages are blown away. The fact that it
> > changes the key in use doesn't seem terribly useful, since it's
> > anonymous memory, and the most secure choice is to use CPU-managed
> > keying, which appears to be the default anyway on TME systems. It
> > also has totally unclear semantics WRT swap, and, off the top of my
> > head, it looks like it may have serious cache-coherency issues and
> > like swapping the pages might corrupt them, both because there are no
> > flushes and because the direct-map alias looks like it will use the
> > default key and therefore appear to contain the wrong data.
> >
> > I would propose a very different direction: don't try to support MKTME
> > at all for anonymous memory, and instead figure out the important use
> > cases and support them directly. The use cases that I can think of
> > off the top of my head are:
> >
> > 1. pmem. This should probably use a very different API.
> >
> > 2. Some kind of VM hardening, where a VM's memory can be protected a
> > little tiny bit from the main kernel. But I don't see why this is any
> > better than XPO (eXclusive Page-frame Ownership), which brings to
> > mind:
>
> What is the threat model anyway for AMD and Intel technologies?
>
> For me it looks like that you can read, write and even replay
> encrypted pages both in SME and TME.
Right. Neither of them (including MKTME) prevents replay attack. But in my understanding SEV doesn't
prevent replay attack either since it doesn't have integrity protection.
Thanks,
-Kai

On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> MKTME (Multi-Key Total Memory Encryption) key payloads may include
> data encryption keys, tweak keys, and additional entropy bits. These
> are used to program the MKTME encryption hardware. By default, the
> kernel destroys this payload data once the hardware is programmed.
>
> However, in order to fully support CPU Hotplug, saving the key data
> becomes important. The MKTME Key Service cannot allow a new physical
> package to come online unless it can program the new packages Key Table
> to match the Key Tables of all existing physical packages.
>
> With CPU generated keys (a.k.a. random keys or ephemeral keys) the
> saving of user key data is not an issue. The kernel and MKTME hardware
> can generate strong encryption keys without recalling any user supplied
> data.
>
> With USER directed keys (a.k.a. user type) saving the key programming
> data (data and tweak key) becomes an issue. The data and tweak keys
> are required to program those keys on a new physical package.
>
> In preparation for adding CPU hotplug support:
>
> Add an 'mktme_vault' where key data is stored.
>
> Add 'mktme_savekeys' kernel command line parameter that directs
> what key data can be stored. If it is not set, kernel does not
> store users data key or tweak key.
>
> Add 'mktme_bitmap_user_type' to track when USER type keys are in
> use. If no USER type keys are currently in use, a physical package
> may be brought online, despite the absence of 'mktme_savekeys'.
Overall, I am not sure whether saving key is good idea, since it breaks coldboot attack IMHO. We
need to tradeoff between supporting CPU hotplug and security. I am not sure whether supporting CPU
hotplug is that important, since for some other features such as SGX, we don't support CPU hotplug
anyway.
Alternatively, we can choose to use per-socket keyID, but not to program keyID globally across all
sockets, so you don't have to save key while still supporting CPU hotplug.
Thanks,
-Kai

On Thu, Dec 06, 2018 at 06:14:03PM -0800, Huang, Kai wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
8< ------------
> > Add an 'mktme_vault' where key data is stored.
> >
> > Add 'mktme_savekeys' kernel command line parameter that directs
> > what key data can be stored. If it is not set, kernel does not
> > store users data key or tweak key.
> >
> > Add 'mktme_bitmap_user_type' to track when USER type keys are in
> > use. If no USER type keys are currently in use, a physical package
> > may be brought online, despite the absence of 'mktme_savekeys'.
>
> Overall, I am not sure whether saving key is good idea, since it breaks coldboot attack IMHO. We
> need to tradeoff between supporting CPU hotplug and security. I am not sure whether supporting CPU
> hotplug is that important, since for some other features such as SGX, we don't support CPU hotplug
> anyway.
Yes, saving the key data exposes it in a cold boot attack.
Here we have 2 conflicting requirements. Do not save the data and
support CPU hotplug. I don't think CPU hotplug support is budging!
If the risk of offering the mktme_savekeys option is too dangerous,
then we can't have user type keys.
Is mktme_savekeys options too risky to offer?
(That's not just a question for you Kai ;). I'll pursue too.)
>
> Alternatively, we can choose to use per-socket keyID, but not to program keyID globally across all
> sockets, so you don't have to save key while still supporting CPU hotplug.
An alternative, with a lot of impact to the core linux support for
MKTME. I don't think we need to go there. I'll leave this thought for
a Kirill or Dave to perhaps elaborate on.
Alison
>
> Thanks,
> -Kai

On 12/6/18 5:55 PM, Huang, Kai wrote:
> I think one usage of user-specified key is for NVDIMM, since CPU key
> will be gone after machine reboot, therefore if NVDIMM is encrypted
> by CPU key we are not able to retrieve it once shutdown/reboot, etc.
I think we all agree that the NVDIMM uses are really useful.
But, these patches don't implement that. So, if NVDIMMs are the only
reasonable use case, we shouldn't merge these patches until we add
NVDIMM support.

On Thu, Dec 06, 2018 at 06:14:03PM -0800, Huang, Kai wrote:
> On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote:
> > MKTME (Multi-Key Total Memory Encryption) key payloads may include
> > data encryption keys, tweak keys, and additional entropy bits. These
> > are used to program the MKTME encryption hardware. By default, the
> > kernel destroys this payload data once the hardware is programmed.
> >
> > However, in order to fully support CPU Hotplug, saving the key data
> > becomes important. The MKTME Key Service cannot allow a new physical
> > package to come online unless it can program the new packages Key Table
> > to match the Key Tables of all existing physical packages.
> >
> > With CPU generated keys (a.k.a. random keys or ephemeral keys) the
> > saving of user key data is not an issue. The kernel and MKTME hardware
> > can generate strong encryption keys without recalling any user supplied
> > data.
> >
> > With USER directed keys (a.k.a. user type) saving the key programming
> > data (data and tweak key) becomes an issue. The data and tweak keys
> > are required to program those keys on a new physical package.
> >
> > In preparation for adding CPU hotplug support:
> >
> > Add an 'mktme_vault' where key data is stored.
> >
> > Add 'mktme_savekeys' kernel command line parameter that directs
> > what key data can be stored. If it is not set, kernel does not
> > store users data key or tweak key.
> >
> > Add 'mktme_bitmap_user_type' to track when USER type keys are in
> > use. If no USER type keys are currently in use, a physical package
> > may be brought online, despite the absence of 'mktme_savekeys'.
>
> Overall, I am not sure whether saving key is good idea, since it
> breaks coldboot attack IMHO. We need to tradeoff between supporting
> CPU hotplug and security. I am not sure whether supporting CPU hotplug
> is that important, since for some other features such as SGX, we don't
> support CPU hotplug anyway.
What is the application for saving the key anyway?
With my current knowledge, I'm not even sure what is the application
for user provided keys.
/Jarkko

On Thu, Dec 06, 2018 at 06:05:50PM -0800, Huang, Kai wrote:
> On Wed, 2018-12-05 at 22:19 +0000, Sakkinen, Jarkko wrote:
> > On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> > > I'm not Thomas, but I think it's the wrong direction. As it stands,
> > > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > > missing the protection key support), and it's also functionally just
> > > MADV_DONTNEED. In other words, the sole user-visible effect appears
> > > to be that the existing pages are blown away. The fact that it
> > > changes the key in use doesn't seem terribly useful, since it's
> > > anonymous memory, and the most secure choice is to use CPU-managed
> > > keying, which appears to be the default anyway on TME systems. It
> > > also has totally unclear semantics WRT swap, and, off the top of my
> > > head, it looks like it may have serious cache-coherency issues and
> > > like swapping the pages might corrupt them, both because there are no
> > > flushes and because the direct-map alias looks like it will use the
> > > default key and therefore appear to contain the wrong data.
> > >
> > > I would propose a very different direction: don't try to support MKTME
> > > at all for anonymous memory, and instead figure out the important use
> > > cases and support them directly. The use cases that I can think of
> > > off the top of my head are:
> > >
> > > 1. pmem. This should probably use a very different API.
> > >
> > > 2. Some kind of VM hardening, where a VM's memory can be protected a
> > > little tiny bit from the main kernel. But I don't see why this is any
> > > better than XPO (eXclusive Page-frame Ownership), which brings to
> > > mind:
> >
> > What is the threat model anyway for AMD and Intel technologies?
> >
> > For me it looks like that you can read, write and even replay
> > encrypted pages both in SME and TME.
>
> Right. Neither of them (including MKTME) prevents replay attack. But
> in my understanding SEV doesn't prevent replay attack either since it
> doesn't have integrity protection.
Yep, it doesn't :-) That's why I've been wondering after seeing
presentations concerning SME and SVE what they are good for.
Cold boot attacks are definitely at least something where these
techs can help...
/Jarkko

On Thu, 2018-12-06 at 06:59 -0800, Dave Hansen wrote:
> On 12/6/18 3:22 AM, Kirill A. Shutemov wrote:
> > > When you say "disable encryption to a page" does the encryption get
> > > actually disabled or does the CPU just decrypt it transparently i.e.
> > > what happens physically?
> >
> > Yes, it gets disabled. Physically. It overrides TME encryption.
>
> I know MKTME itself has a runtime overhead and we expect it to have a
> performance impact in the low single digits. Does TME have that
> overhead? Presumably MKTME plus no-encryption is not expected to have
> the overhead.
>
> We should probably mention that in the changelogs too.
>
I believe in terms of hardware crypto overhead MKTME and TME should have the same (except MKTME no-
encrypt case?). But MKTME might have additional overhead from software implementation in kernel?
Thanks,
-Kai

On Fri, Dec 07, 2018 at 02:14:03AM +0000, Huang, Kai wrote:
> Alternatively, we can choose to use per-socket keyID, but not to program
> keyID globally across all sockets, so you don't have to save key while
> still supporting CPU hotplug.
Per-socket KeyID approach would make things more complex. For instance
KeyID on its own will not be enough to refer a key. You will need a node
too. It will also require a way to track whether theirs an KeyID on other
node for the key.
It also makes memory management less flexible: runtime migration of the
memory between nodes will be limited and it can hurt memory availablity
for non-encrypted tasks too.
In general, I don't see per-socket KeyID handling very attractive. It
creates more problems than solves.
--
Kirill A. Shutemov

On Thu, Dec 06, 2018 at 09:23:20PM +0000, Sakkinen, Jarkko wrote:
> On Thu, 2018-12-06 at 14:22 +0300, Kirill A. Shutemov wrote:
> > When you say "disable encryption to a page" does the encryption get
> > > actually disabled or does the CPU just decrypt it transparently i.e.
> > > what happens physically?
> >
> > Yes, it gets disabled. Physically. It overrides TME encryption.
>
> OK, thanks for confirmation. BTW, how much is the penalty to keep it
> always enabled? Is it something that would not make sense for some
> other reasons?
We don't have any numbers to share at this point.
--
Kirill A. Shutemov

On Wed, Dec 05, 2018 at 10:19:20PM +0000, Sakkinen, Jarkko wrote:
> On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote:
> > I'm not Thomas, but I think it's the wrong direction. As it stands,
> > encrypt_mprotect() is an incomplete version of mprotect() (since it's
> > missing the protection key support), and it's also functionally just
> > MADV_DONTNEED. In other words, the sole user-visible effect appears
> > to be that the existing pages are blown away. The fact that it
> > changes the key in use doesn't seem terribly useful, since it's
> > anonymous memory, and the most secure choice is to use CPU-managed
> > keying, which appears to be the default anyway on TME systems. It
> > also has totally unclear semantics WRT swap, and, off the top of my
> > head, it looks like it may have serious cache-coherency issues and
> > like swapping the pages might corrupt them, both because there are no
> > flushes and because the direct-map alias looks like it will use the
> > default key and therefore appear to contain the wrong data.
> >
> > I would propose a very different direction: don't try to support MKTME
> > at all for anonymous memory, and instead figure out the important use
> > cases and support them directly. The use cases that I can think of
> > off the top of my head are:
> >
> > 1. pmem. This should probably use a very different API.
> >
> > 2. Some kind of VM hardening, where a VM's memory can be protected a
> > little tiny bit from the main kernel. But I don't see why this is any
> > better than XPO (eXclusive Page-frame Ownership), which brings to
> > mind:
>
> What is the threat model anyway for AMD and Intel technologies?
>
> For me it looks like that you can read, write and even replay
> encrypted pages both in SME and TME.
What replay attack are you talking about? MKTME uses AES-XTS with physical
address tweak. So the data is tied to the place in physical address space
and replacing one encrypted page with another encrypted page from
different address will produce garbage on decryption.
--
Kirill A. Shutemov

On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > What is the threat model anyway for AMD and Intel technologies?
> >
> > For me it looks like that you can read, write and even replay
> > encrypted pages both in SME and TME.
>
> What replay attack are you talking about? MKTME uses AES-XTS with physical
> address tweak. So the data is tied to the place in physical address space and
> replacing one encrypted page with another encrypted page from different
> address will produce garbage on decryption.
Just trying to understand how this works.
So you use physical address like a nonce/version for the page and
thus prevent replay? Was not aware of this.
/Jarkko

On Fri, Dec 7, 2018 at 3:57 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > What is the threat model anyway for AMD and Intel technologies?
> >
> > For me it looks like that you can read, write and even replay
> > encrypted pages both in SME and TME.
>
> What replay attack are you talking about? MKTME uses AES-XTS with physical
> address tweak. So the data is tied to the place in physical address space
> and replacing one encrypted page with another encrypted page from
> different address will produce garbage on decryption.
What if you have some control over the physical addresses you write
the stolen encrypted page to? For instance, VM_Eve might manage to use
physical address space previously used by VM_Alice by getting the
hypervisor to move memory around (memory pressure, force other VMs out
via some type of DOS attack, etc.).
Say:
C is VM_Alice's clear text at hwaddr
E = mktme_encrypt(VM_Allice_key, hwaddr, C)
Eve somehow stole the encrypted bits E
Eve would need to write the page E without further encryption to make
sure that the DRAM contains the original stolen bits E, not encrypted
again with VM_Eve's key or mktme_encrypt(VM_Eve_key, hwaddr, E) would
be present in the DRAM which is not helpful. But with MKTME under the
current proposal VM_Eve can disable encryption for a given mapping,
right? (See also Note 1)
Eve gets the HV to move VM_Alice back over the same physical address,
Eve "somehow" gets VM_Alice to read that page and use its content
(which would likely be a use of uninitialized memory bug, from
VM_Alice's perspective) and you have a replay attack?
For TME, this doesn't work as you cannot partially disable encryption,
so if Eve tries to write the stolen encrypted bits E, even in the
"right place", they get encrypted again to tme_encrypt(hwaddr, E).
Upon decryption, VM_Alice will get E, not C.
Note 1: Actually, even if with MKTME you cannot disable encryption but
*if* Eve knows its own key, Eve can always write a preimage P that the
CPU encrypts to E for VM_Alice to read back and decrypt:
P = mktme_decrypt(VM_Eve_key, hwaddr, E)
This is not possible with TME as Eve doesn't know the key used by the
CPU and cannot compute P.

On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote:
> On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > > What is the threat model anyway for AMD and Intel technologies?
> > >
> > > For me it looks like that you can read, write and even replay
> > > encrypted pages both in SME and TME.
> >
> > What replay attack are you talking about? MKTME uses AES-XTS with physical
> > address tweak. So the data is tied to the place in physical address space
> > and
> > replacing one encrypted page with another encrypted page from different
> > address will produce garbage on decryption.
>
> Just trying to understand how this works.
>
> So you use physical address like a nonce/version for the page and
> thus prevent replay? Was not aware of this.
The brutal fact is that a physical address is an astronomical stretch
from a random value or increasing counter. Thus, it is fair to say that
MKTME provides only naive measures against replay attacks...
/Jarkko

On Fri, Dec 7, 2018 at 3:45 PM Sakkinen, Jarkko
<jarkko.sakkinen@intel.com> wrote:
>
> On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote:
> > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > > > What is the threat model anyway for AMD and Intel technologies?
> > > >
> > > > For me it looks like that you can read, write and even replay
> > > > encrypted pages both in SME and TME.
> > >
> > > What replay attack are you talking about? MKTME uses AES-XTS with physical
> > > address tweak. So the data is tied to the place in physical address space
> > > and
> > > replacing one encrypted page with another encrypted page from different
> > > address will produce garbage on decryption.
> >
> > Just trying to understand how this works.
> >
> > So you use physical address like a nonce/version for the page and
> > thus prevent replay? Was not aware of this.
>
> The brutal fact is that a physical address is an astronomical stretch
> from a random value or increasing counter. Thus, it is fair to say that
> MKTME provides only naive measures against replay attacks...
>
And this is potentially a big deal, since there are much simpler
replay attacks that can compromise the system. For example, if I can
replay the contents of a page table, I can write to freed memory.
--Andy

> On Dec 6, 2018, at 5:55 PM, Huang, Kai <kai.huang@intel.com> wrote:
>
>
>>
>> TME itself provides a ton of protection -- you can't just barge into
>> the datacenter, refrigerate the DIMMs, walk away with them, and read
>> off everyone's data.
>>
>> Am I missing something?
>
> I think we can make such assumption in most cases, but I think it's better that we don't make any
> assumption at all. For example, the admin of data center (or anyone) who has physical access to
> servers may do something malicious. I am not expert but there should be other physical attack
> methods besides coldboot attack, if the malicious employee can get physical access to server w/o
> being detected.
>
>>
>>>
>>> But, I think what you're implying is that the security properties of
>>> user-supplied keys can only be *worse* than using CPU-generated keys
>>> (assuming the CPU does a good job generating it). So, why bother
>>> allowing user-specified keys in the first place?
>>
>> That too :)
>
> I think one usage of user-specified key is for NVDIMM, since CPU key will be gone after machine
> reboot, therefore if NVDIMM is encrypted by CPU key we are not able to retrieve it once
> shutdown/reboot, etc.
>
> There are some other use cases that already require tenant to send key to CSP. For example, the VM
> image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key to
> CSP when asking CSP to run that encrypted image.
I can imagine a few reasons why one would want to encrypt one’s image.
For example, the CSP could issue a public key and state, or even
attest, that the key is wrapped and locked to particular PCRs of their
TPM or otherwise protected by an enclave that verifies that the key is
only used to decrypt the image for the benefit of a hypervisor.
I don’t see what MKTME has to do with this. The only remotely
plausible way I can see to use MKTME for this is to have the
hypervisor load a TPM (or other enclave) protected key into an MKTME
user key slot and to load customer-provided ciphertext into the
corresponding physical memory (using an MKTME no-encrypt slot). But
this has three major problems. First, it's effectively just a fancy
way to avoid one AES pass over the data. Second, sensible scheme for
this type of VM image protection would use *authenticated* encryption
or at least verify a signature, which MKTME can't do. The third
problem is the real show-stopper, though: this scheme requires that
the ciphertext go into predetermined physical addresses, which would
be a giant mess.

On 12/7/18 3:53 PM, Andy Lutomirski wrote:
> The third problem is the real show-stopper, though: this scheme
> requires that the ciphertext go into predetermined physical
> addresses, which would be a giant mess.
There's a more fundamental problem than that. The tweak fed into the
actual AES-XTS operation is determined by the firmware, programmed into
the memory controller, and is not visible to software. So, not only
would you need to put stuff at a fixed physical address, the tweaks can
change from boot-to-boot, so whatever you did would only be good for one
boot.

On Fri, 2018-12-07 at 23:45 +0000, Sakkinen, Jarkko wrote:
> On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote:
> > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote:
> > > > What is the threat model anyway for AMD and Intel technologies?
> > > >
> > > > For me it looks like that you can read, write and even replay
> > > > encrypted pages both in SME and TME.
> > >
> > > What replay attack are you talking about? MKTME uses AES-XTS with physical
> > > address tweak. So the data is tied to the place in physical address space
> > > and
> > > replacing one encrypted page with another encrypted page from different
> > > address will produce garbage on decryption.
> >
> > Just trying to understand how this works.
> >
> > So you use physical address like a nonce/version for the page and
> > thus prevent replay? Was not aware of this.
>
> The brutal fact is that a physical address is an astronomical stretch
> from a random value or increasing counter. Thus, it is fair to say that
> MKTME provides only naive measures against replay attacks...
>
> /Jarkko
Currently there's no nonce to protect cache line so TME/MKTME is not able to prevent replay attack
you mentioned. Currently MKTME only involves AES-XTS-128 encryption but nothing else. But like I
said if I understand correctly even SEV doesn't have integrity protection so not able to prevent
reply attack as well.
Thanks,
-Kai

> > There are some other use cases that already require tenant to send key to CSP. For example, the
> > VM
> > image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key
> > to
> > CSP when asking CSP to run that encrypted image.
>
>
> I can imagine a few reasons why one would want to encrypt one’s image.
> For example, the CSP could issue a public key and state, or even
> attest, that the key is wrapped and locked to particular PCRs of their
> TPM or otherwise protected by an enclave that verifies that the key is
> only used to decrypt the image for the benefit of a hypervisor.
Right. I think before tenant releases key to CSP it should always use attestation authority to
verify the trustiness of computer node. I can understand that the key can be wrapped by TPM before
sending to CSP but need some catch up about using enclave part.
The thing is computer node can be trusted doesn't mean it cannot be attacked, or even it doesn't
mean it can prevent, ie some malicious admin, to get tenant key even by using legitimate way. There
are many SW components involved here. Anyway this is not related to MKTME itself like you mentioned
below, therefore the point is, as we already see MKTME itself provides very weak security
protection, we need to see whether MKTME has value from the whole use case's point of view
(including all the things you mentioned above) -- we define the whole use case, we clearly state
who/what should be in trust boundary, and what we can prevent, etc.
>
> I don’t see what MKTME has to do with this. The only remotely
> plausible way I can see to use MKTME for this is to have the
> hypervisor load a TPM (or other enclave) protected key into an MKTME
> user key slot and to load customer-provided ciphertext into the
> corresponding physical memory (using an MKTME no-encrypt slot). But
> this has three major problems. First, it's effectively just a fancy
> way to avoid one AES pass over the data. Second, sensible scheme for
> this type of VM image protection would use *authenticated* encryption
> or at least verify a signature, which MKTME can't do. The third
> problem is the real show-stopper, though: this scheme requires that
> the ciphertext go into predetermined physical addresses, which would
> be a giant mess.
My intention was to say if we are already sending key to CSP, then we may prefer to use the key for
MKTME VM runtime protection as well, but like you said we may not have real security gain here
comparing to TME, so I agree we need to find out one specific case to prove that.
Thanks,
-Kai

On Sat, 2018-12-08 at 09:33 +0800, Huang, Kai wrote:
> Currently there's no nonce to protect cache line so TME/MKTME is not able to
> prevent replay attack
> you mentioned. Currently MKTME only involves AES-XTS-128 encryption but
> nothing else. But like I
> said if I understand correctly even SEV doesn't have integrity protection so
> not able to prevent
> reply attack as well.
You're absolutely correct.
There's a also good paper on SEV subvertion:
https://arxiv.org/pdf/1805.09604.pdf
I don't think this makes MKTME or SEV uselss, but yeah, it is a
constraint that needs to be taken into consideration when finding the
best way to use these technologies in Linux.
/Jarkko

On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote:
> The brutal fact is that a physical address is an astronomical stretch
> from a random value or increasing counter. Thus, it is fair to say that
> MKTME provides only naive measures against replay attacks...
I'll try to summarize how I understand the high level security
model of MKTME because (would be good idea to document it).
Assumptions:
1. The hypervisor has not been infiltrated.
2. The hypervisor does not leak secrets.
When (1) and (2) hold [1], we harden VMs in two different ways:
A. VMs cannot leak data to each other or can they with L1TF when HT
is enabled?
B. Protects against cold boot attacks.
Isn't this what this about in the nutshell roughly?
[1] XPFO could potentially be an opt-in feature that reduces the
damage when either of these assumptions has been broken.
/Jarkko

On Wed, Dec 12, 2018 at 7:31 AM Sakkinen, Jarkko
<jarkko.sakkinen@intel.com> wrote:
>
> On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote:
> > The brutal fact is that a physical address is an astronomical stretch
> > from a random value or increasing counter. Thus, it is fair to say that
> > MKTME provides only naive measures against replay attacks...
>
> I'll try to summarize how I understand the high level security
> model of MKTME because (would be good idea to document it).
>
> Assumptions:
>
> 1. The hypervisor has not been infiltrated.
> 2. The hypervisor does not leak secrets.
>
> When (1) and (2) hold [1], we harden VMs in two different ways:
>
> A. VMs cannot leak data to each other or can they with L1TF when HT
> is enabled?
I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no
protection whatsoever. It sounds like MKTME is implemented in the
memory controller -- as far as the rest of the CPU and the cache
hierarchy are concerned, the MKTME key selction bits are just part of
the physical address. So an attack like L1TF that leaks a cacheline
that's selected by physical address will leak the cleartext if the key
selection bits are set correctly.
(I suppose that, if the attacker needs to brute-force the physical
address, then MKTME makes it a bit harder because the effective
physical address space is larger.)
> B. Protects against cold boot attacks.
TME does this, AFAIK. MKTME does, too, unless the "user" mode is
used, in which case the protection is weaker.
>
> Isn't this what this about in the nutshell roughly?
>
> [1] XPFO could potentially be an opt-in feature that reduces the
> damage when either of these assumptions has been broken.
>
> /Jarkko

On Wed, 2018-12-12 at 08:29 -0800, Andy Lutomirski wrote:
> On Wed, Dec 12, 2018 at 7:31 AM Sakkinen, Jarkko
> <jarkko.sakkinen@intel.com> wrote:
> > On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote:
> > > The brutal fact is that a physical address is an astronomical stretch
> > > from a random value or increasing counter. Thus, it is fair to say that
> > > MKTME provides only naive measures against replay attacks...
> >
> > I'll try to summarize how I understand the high level security
> > model of MKTME because (would be good idea to document it).
> >
> > Assumptions:
> >
> > 1. The hypervisor has not been infiltrated.
> > 2. The hypervisor does not leak secrets.
> >
> > When (1) and (2) hold [1], we harden VMs in two different ways:
> >
> > A. VMs cannot leak data to each other or can they with L1TF when HT
> > is enabled?
>
> I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no
> protection whatsoever. It sounds like MKTME is implemented in the
> memory controller -- as far as the rest of the CPU and the cache
> hierarchy are concerned, the MKTME key selction bits are just part of
> the physical address. So an attack like L1TF that leaks a cacheline
> that's selected by physical address will leak the cleartext if the key
> selection bits are set correctly.
>
> (I suppose that, if the attacker needs to brute-force the physical
> address, then MKTME makes it a bit harder because the effective
> physical address space is larger.)
>
> > B. Protects against cold boot attacks.
>
> TME does this, AFAIK. MKTME does, too, unless the "user" mode is
> used, in which case the protection is weaker.
>
> > Isn't this what this about in the nutshell roughly?
> >
> > [1] XPFO could potentially be an opt-in feature that reduces the
> > damage when either of these assumptions has been broken.
This all should be summarized in the documentation (high-level model
and corner cases).
/Jarkko

> I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no
> protection whatsoever. It sounds like MKTME is implemented in the
> memory controller -- as far as the rest of the CPU and the cache hierarchy
> are concerned, the MKTME key selction bits are just part of the physical
> address. So an attack like L1TF that leaks a cacheline that's selected by
> physical address will leak the cleartext if the key selection bits are set
> correctly.
Right. MKTME doesn't prevent cache based attack. Data in cache is in clear.
Thanks,
-Kai

> This all should be summarized in the documentation (high-level model and
> corner cases).
I am not sure whether it is necessary to document L1TF explicitly, since it is quite obvious that MKTME doesn't prevent that. IMHO if needed we only need to mention MKTME doesn't prevent any sort of cache based attack, since data in cache is in clear.
In fact SGX doesn't prevent this either..
Thanks,
-Kai

On Thu, 2018-12-13 at 07:27 +0800, Huang, Kai wrote:
> > This all should be summarized in the documentation (high-level model and
> > corner cases).
>
> I am not sure whether it is necessary to document L1TF explicitly, since it is
> quite obvious that MKTME doesn't prevent that. IMHO if needed we only need to
> mention MKTME doesn't prevent any sort of cache based attack, since data in
> cache is in clear.
>
> In fact SGX doesn't prevent this either..
Sorry, was a bit unclear. I meant the assumptions and goals.
/Jarkko

On Thu, 2018-12-13 at 07:49 +0200, Jarkko Sakkinen wrote:
> On Thu, 2018-12-13 at 07:27 +0800, Huang, Kai wrote:
> > > This all should be summarized in the documentation (high-level model and
> > > corner cases).
> >
> > I am not sure whether it is necessary to document L1TF explicitly, since it
> > is
> > quite obvious that MKTME doesn't prevent that. IMHO if needed we only need
> > to
> > mention MKTME doesn't prevent any sort of cache based attack, since data in
> > cache is in clear.
> >
> > In fact SGX doesn't prevent this either..
>
> Sorry, was a bit unclear. I meant the assumptions and goals.
I.e. what I put in my earlier response, what belongs to TCB and what
types adversaries is pursued to be protected.
/Jarkko