https://software.intel.com/es-es/forums/topic/328522/feed
esThanks a lot for the detailedhttps://software.intel.com/es-es/comment/1672273#comment-1672273
<a id="comment-1672273"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Thanks a lot for the detailed info. I will try out your suggestions. My current goal is to mainly determine the time for stalls due to memory contention (when multiple cores are working). Thanks again. </p>
</div></div></div>Thu, 04 Oct 2012 23:52:48 +0000shouvikbardhancomment 1672273 at https://software.intel.comThe big problem in this sorthttps://software.intel.com/es-es/comment/1671904#comment-1671904
<a id="comment-1671904"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>The big problem in this sort of analysis is attributing stalls to specific causes when there are multiple underlying stall conditions.</p>
<p>There are also multiple functional units in any modern processor, so you have to decide whether a stall on one or more units is really a stall if other units are able to get work done (or initiate new work) in the same cycle.</p>
<p>A good performance analysis overview for processors using the Nehalem/Westmere cores is at<br />
<a href="http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf">http://software.intel.com/sites/products/collateral/hpc/vtune/performanc...</a></p>
<p>Most of the analysis is applicable to Sandy Bridge cores, though many of the specific performance counter events have changed.</p>
<p>The Intel Arch SW Developer's Guide, Volume 3, lists all of the performance monitor events for the various processor families.<br />
For Sandy Bridge, performance monitor event 0Eh, Mask 01h, counts the number of uops issued each cycle. The notes for that entry in Table 19-3 tell which bits to set to get the event to count cycles for which zero uops are *issued*. Similarly, Event C2h, mask 01h can be used to count the number of cycles in which no uops are *retired*. Either of these cases can be considered "stall" conditions, but there will be a lot of overlap between the two events, so you want to count one or the other, but not the sum. (Perhaps the larger of the two?)</p>
<p>Also on Sandy Bridge, Events 59h, 5Bh, 87h, A2h count some specific stall conditions.<br />
The most useful things to look at are probably Event A2 with Mask 02h to count stalls due to lack of free load buffers and Event A2 with Mask 08h to count stalls due to lack of free store buffers. </p>
<p>It can require a lot of knowledge of the microarchitecture to understand what these events mean in detail. Intel provides some of this information, but it is spread across a lot of documents.</p>
<p>The best source of general information on how to optimize to eliminate stalls (including info for Sandy Bridge) is probably the Intel SW Optimization guide (document 248966, I use revision 026 from April 2012).</p>
</div></div></div>Thu, 04 Oct 2012 17:55:14 +0000jdmccalpincomment 1671904 at https://software.intel.comI have found some good infohttps://software.intel.com/es-es/comment/1670060#comment-1670060
<a id="comment-1670060"></a>
<div class="field field-name-comment-body field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>I have found some good info here, albeit for Itanium. Now I need to find similar info for Sandy Bridge which is my processor.<br />
<a href="http://software.intel.com/en-us/articles/characterize-application-performance-with-stall-events-on-64-bit-architecture">http://software.intel.com/en-us/articles/characterize-application-perfor...</a></p>
</div></div></div>Thu, 04 Oct 2012 04:41:52 +0000shouvikbardhancomment 1670060 at https://software.intel.com