P4 Optimization Reference Manual

P4 Optimization Reference Manual

1) The new P4/X OptRefMan states that L1 non-int latency is now 9 instead of 6; this makes it higher than L2's 7 cycle latency... Does that mean that L1 hits are slower than L1 misses???

2) RCPPS instuction listed in App C - lat/thr table - has it's execution unit listed as MMX_MISC; however, it cannot be issued in the cycles before and after some instructions, e.g. xorps (MMX_ALU). Does that mean that:a) MMX_MISC is actually using other units?b) this is caused by microcode transition or other way not connected to execution units?c) MMX_MISC blocks other units?

3) I think there is a misprint in the P4/X OptRefMan in the App C, SSE2DPFP, page C-8, 4th line in the table: it states MULSS. I think it should be MULSD.

BTW, big thanks to your team for correcting divps/-ss/sqrtps/-ss latencies and adding new entries in the lat/thr tables and the HTT.

1) L1 non-int latency is now stated as 9 cycles; that's fine, 9 cycles is true for Northwood. But, how were these numbers taken? How were the 7/7 cycle latency numbers for L2 taken?Shouldn't the L2 latency be in the 18/18 range?BTW, what about stating DTLB latency along with L2 results for 256kB+ range?

Is there any way that I can contact the Optimization Reference Manual writers team?