Mathieu Desnoyers wrote:> * Masami Hiramatsu (mhiramat@redhat.com) wrote:>> Mathieu Desnoyers wrote:>>> * Masami Hiramatsu (mhiramat@redhat.com) wrote:>>>> Mathieu Desnoyers wrote:>>>>> * Masami Hiramatsu (mhiramat@redhat.com) wrote:>>>>>> Use text_poke_smp_batch() in optimization path for reducing>>>>>> the number of stop_machine() issues.>>>>>>>>>>>> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>>>>>>> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>>>>>>> Cc: Ingo Molnar <mingo@elte.hu>>>>>>> Cc: Jim Keniston <jkenisto@us.ibm.com>>>>>>> Cc: Jason Baron <jbaron@redhat.com>>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>>>>>>> --->>>>>>>>>>>> arch/x86/kernel/kprobes.c | 37 ++++++++++++++++++++++++++++++------->>>>>> include/linux/kprobes.h | 2 +->>>>>> kernel/kprobes.c | 13 +------------>>>>>> 3 files changed, 32 insertions(+), 20 deletions(-)>>>>>>>>>>>> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c>>>>>> index 345a4b1..63a5c24 100644>>>>>> --- a/arch/x86/kernel/kprobes.c>>>>>> +++ b/arch/x86/kernel/kprobes.c>>>>>> @@ -1385,10 +1385,14 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)>>>>>> return 0;>>>>>> }>>>>>> >>>>>> -/* Replace a breakpoint (int3) with a relative jump. */>>>>>> -int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op)>>>>>> +#define MAX_OPTIMIZE_PROBES 256>>>>>>>>>> So what kind of interrupt latency does a 256-probes batch generate on the>>>>> system ? Are we talking about a few milliseconds, a few seconds ?>>>>>>>> From my experiment on kvm/4cpu, it took about 3 seconds in average.>>>>>> That's 3 seconds for multiple calls to stop_machine(). So we can expect>>> latencies in the area of few microseconds for each call, right ?>>>> Theoretically yes.>> But if we register more than 1000 probes at once, it's hard to do>> anything except optimizing a while(more than 10 sec), because>> it stops machine so frequently.>>>>>> With this patch, it went down to 30ms. (x100 faster :))>>>>>> This is beefing up the latency from few microseconds to 30ms. It sounds like a>>> regression rather than a gain to me.>>>> If it is not acceptable, I can add a knob for control how many probes>> optimize/unoptimize at once. Anyway, it is expectable latency (after>> registering/unregistering probes) and it will be small if we put a few probes.>> (30ms is the worst case)>> And if you want, it can be disabled by sysctl.> > I think we are starting to see the stop_machine() approach is really limiting> our ability to do even relatively small amount of work without hurting> responsiveness significantly.> > What's the current showstopper with the breakpoint-bypass-ipi approach that> solves this issue properly and makes this batching approach unnecessary ?

We still do not have any official answer from chip vendors.As you know, basic implementation has been done.