The idea is simple: there's a bit, per APIC, in guest memory,that tells the guest that it does not need EOI.Guest tests it using a single est and clear operation - this isnecessary so that host can detect interrupt nesting - and if set, it canskip the EOI MSR.

I run a simple microbenchmark to show exit reduction(note: for testing, need to apply follow-up patch'kvm: host side for eoi optimization' + a qemu patch I posted separately, on host):

/* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored.@@ -37,6 +38,7 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03+#define MSR_KVM_PV_EOI_EN 0x4b564d04

+/* size alignment is implied but just to make it explicit. */+static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) __aligned(2) =+ KVM_PV_EOI_DISABLED;++static void kvm_guest_apic_eoi_write(u32 reg, u32 val)+{+ /**+ * This relies on __test_and_clear_bit to modify the memory+ * in a way that is atomic with respect to the local CPU.+ * The hypervisor only accesses this memory from the local CPU so+ * there's no need for lock or memory barriers.+ * An optimization barrier is implied in apic write.+ */+ if (__test_and_clear_bit(KVM_PV_EOI_BIT, &__get_cpu_var(kvm_apic_eoi)))+ return;+ apic->write(APIC_EOI, APIC_EOI_ACK);+}+ void __cpuinit kvm_guest_cpu_init(void) { if (!kvm_para_available())@@ -300,11 +320,17 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); }