Currently, the cost of the tracepoint is the global memory read, andcompare, and then a jump. On x86 systems that I've tested this can averageanywhere b/w 40 - 100 cycles per tracepoints. Plus, there is theicache overhead of the extra instructions that we skip over. I'm notsure how to measure that beyond looking at their size.

I've proposed a 'jump label' set of patches, which essentially hardcodes a jump around the disabled code (avoiding the memory reference).However, this introduces a high 'write' cost in that we code patch thejmp to a 'jmp 0' to enable the code.

Along with this optimization I'm also looking into a method for movingthe disabled text to a 'cold' text section, to reduce the icacheoverhead. Using these techniques we can reduce the disabled case toessentially a couple of cycles per tracepoint.

In this case, where the tracepoint is always on, we wouldn't want tomove the tracepoint text to a cold section. Thus, I could introduce adefault enabled/disabled bias to the tracepoint.

However, in introducing such a feature, we are essentially forcing analways on, or always off usage pattern, since the switch cost is high.So I want to be careful not limit usefullness of tracepoints with suchan optimization.