Blogs - Tags - folding urn:lsid:ibm.com:blogs:entries03022015-05-15T16:43:55-04:00IBM Connections - Blogsurn:lsid:ibm.com:blogs:entry-844b2076-e8c2-4b3b-952c-070a3b59ea73Want ot get sub_ms response times for SAP applications into DB2mreede120000AB3Pactivefalse9fae1b19-b191-45b5-8f8c-69be172e9798Comment Entriesapplication/atom+xml;type=entryLikestrue2015-06-25T11:39:19-04:002015-06-25T11:39:19-04:00<p dir="ltr">Do you care for low latency between AIX and the mainframe? It might be due to <a href="https://ibm.biz/BdXGhu" target="_blank">CPU&nbsp;folding</a>: https://ibm.biz/BdXGhu</p>
<p dir="ltr">SAP applicaitons runing on AIX via DB2 CLI were not meeting the target of getting lower than 1 ms avgRTTs into DB2 on the z/OS over&nbsp; 10GbE dedicated infrastructure. The AIX LPARs were running on Power 7 with VIOS. A tcpdump taken on the SAP AXI LPAR was showing spiky RTTs from DB2 but also higher RTTs into the AIX itself (looking from the tcpdump capture point in the same LPAR...<br>
... This was one lab exercise of the <span style="color:#000080;"><strong><span style="background-color:#FFFF00;">wireshark bootcamp 2015 ZOWIE0DE</span></strong></span> ...</p>
<p dir="ltr">Interested in another iteration of the bootcamp? Join the <a href="https://www.ibm.com/developerworks/community/groups/service/html/communitystart?communityUuid=f0c70e7a-14af-46c2-91b8-4d90b35f19da" target="_blank"><span style="color:#000080;"><strong>IP&nbsp;wiZards&nbsp;Community</strong></span></a> and leave a message at my profile</p>
<p dir="ltr">Rgeards, Matthias Burkhard&nbsp; <a href="https://ibm.biz/mrEEde_dW/">https://ibm.biz/mrEEde_dW/</a></p>
<p dir="ltr">&nbsp;</p>
<hr dir="ltr"></hr>
<p dir="ltr">... during the trace action the CPU folding was temporarily disabled to see if it has influence on the overall latency. At the end of the trace CPU folding was again changed back to its default...<br>
<a href="https://www.ibm.com/developerworks/community/blogs/9fae1b19-b191-45b5-8f8c-69be172e9798/resource/BLOGS_UPLOADED_IMAGES/Screenshot-265.png" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/9fae1b19-b191-45b5-8f8c-69be172e9798/resource/BLOGS_UPLOADED_IMAGES/Screenshot-265.png" style=" display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></p>Do you care for low latency between AIX and the mainframe? It might be due to CPU&nbsp;folding : https://ibm.biz/BdXGhu SAP applicaitons runing on AIX via DB2 CLI were not meeting the target of getting lower than 1 ms avgRTTs into DB2 on the z/OS over&nbsp;...00329urn:lsid:ibm.com:blogs:entries-ab145a7e-85d5-4348-9e3d-128b7752f986TK-Guide2015-06-25T11:39:19-04:00urn:lsid:ibm.com:blogs:entry-8761f05f-6aa1-4141-b6a7-7ecd1ee4f7fdAIX Virtual Processor Folding is Misunderstoodnagger100000MRSJactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2011-08-08T08:42:23-04:002011-08-08T09:03:54-04:00
<div> This mysterious AIX CPU Folding area is often misunderstood, so below is what I know from osmosis from talking to various guru level developers over the last 10 years. Shared Processor virtual machines (LPARs for the old fashioned) have a setting called Virtual Processors (or VP for short). This is the number of physical CPUs that the virtual machines can spread out across - in fact, I prefer to call it the &quot;spreading factor&quot; as it is much more obvious what it means. This <u>can</u> be the upper threshold for the number of CPUs that can be used by your virtual machine:<br /><ul><li>For Capped virtual machines, the Entitlement is the maximum of compute cycles that can be used. </li><li>For Uncapped virtual machines and if there are &quot;spare&quot; CPU cycles in the Shared CPU Pool (or other virtual machines are not requiring their full entitlement) then the VP number is the limit of the total number CPU cycles that can be used.</li></ul></div><div> But AIX does not automatically spread out over all the available virtual processors, if it does not have too, as that is not efficient. If an Uncapped virtual machine has, for example, Entitlement of 8 CPUs and Virtual Processor count of 10 CPUs but only needs at the moment 2.5 CPUs to easily provide CPU time to all the running processes/program then it &quot;folds&quot; away the unneeded 7 virtual processors and runs on just 3. <br /><br /><font color="#0000ff"><b> Why fold?</b></font> I am not a AIX kernel developer (well not now anyway) nor a spokes person for development team but there is not a lot of information about folding. I think this is because it is a AIX kernel internal optimisation and so there is no need to make it public nor document it. I first noticed folding taking place while developing nmon. On the Power 5, 6 and 7 machines, we could clearly see that AIX would stop scheduling work to some virtual processors when they were not needed and thought &quot;these developers are very clever&quot; and was impressed. There are multiple reasons to fold:</div><div><ol><li>If you schedule processes to a lower number of physical CPUs (cores) then the memory is cached better (often called the caches are hotter) in the Level 1, 2 and 3 memory cache levels = hotter caches means more progress as data is loaded in to CPU registers a lot faster.</li><li>The unused physical CPUs can carry on running other virtual machine workloads without interruption = making them more efficient.</li><li>No longer scheduling the virtual machine to a CPU it does not need just to perform pointless context switches to the virtual machine to then give up the CPU milliseconds later = more work done.</li></ol></div><div> <font color="#0000ff"><b>Detecting folding</b></font>: We also noted that these folded virtual processors do actually get some CPU cycles. nmon used the libperfstat AIX library to extract the performance numbers and we found that instead of billions of clock ticks on the processor these folded ones have a few - in the 50 to 100 ticks sort of range. We think this is a tiny amount of house keeping going on like collecting the processor hardware stats, perhaps a clock interrupt and I think that if a processor starts a device driver to action an adapter to perform an operation that the later interrupt is returned to the same processor (which might now be folded). This is a little guess work on my part so don't quote me - please! Back to nmon: there is no official AIX folding statistics from any interface that I have come across. nmon deduces the folding number by monitoring the physical clock ticks and heuristically (I like that words) determines if it is folded or not. In C programming terms, are the number of ticks below a threshold we worked out empirically (another good word) by sitting there and watching the numbers while tweaking the numbers of processes running.</div><div> </div><div><b><font color="#0000ff"> Now for the complicated bit - SMT!</font></b> When a virtual processor is folded all the <span class="st">Simultaneous MultiThreading (<i>SMT</i>) threads are switched off together - the different threads are always running in the same virtual machine. This made the heuristics in the nmon code tricky as it has to check all the threads on the same CPU are doing practically nothing. This was written when Power6 was the current processor and SMT=2 was the maximum. Of course, when Power7 came along with SMT=4 and the code had to be reworked - in fact there is a nmon release that runs on Power7 where the code had not been updated and gets the folding count hopelessly wrong but its fairly obvious when you see it (you can't fold away more CPUs than you actually have!) - install a service pack to fix this.<br /><br /><font color="#0000ff"><b> What does AIX actually do when it folds?</b><font color="#000000"> We have seen that the virtual processors are still there and seen by AIX and occasionally running a few clock cycles. So they are not stopped or released. <b>AIX is simply no longer scheduling processes to run on them</b>. This gives the hypervisor a hint that at the moment it does not have to schedule the virtual machine on them. The AIX kernel to Hypervisor interface and mechanisms are definitely a secret, so I have no clues at this level.<br /><br /><font color="#0000ff"><b> When to fold?</b></font> AIX does this slowly but seems to monitor the CPU use for a period of time (from my observations a few seconds which is a long long time in CPU terms) and determines that the CPU use is steady or dropping and its is safe and efficient to fold a CPU off. It then waits to see what happens next and may then fold away a further CPU. It is maths that larger virtual machines with many 100's or 1000's of active processes and threads of execution do not have sudden jumps of CPU requirements but the workloads sort of &quot;flows in and out&quot; and CPU use is smoothed out.<br /><br /><font color="#0000ff"><b> When to unfold?</b></font> AIX again monitors (it is probably the same algorithm) the CPU use. When it notices a consistent high use of the current unfolded CPUs it decides that unfolding could help the throughput of the processes and unfolds a CPU. If there is a sudden peak in runnable processes it does not immediately unfold. This is because it can already deal successfully with short term transitory peaks as normal via the run queues. If fact, over aggressive unfolding could slow the applications down. When a CPU is unfolded its caches will be empty then a happy process on the running CPUs with hot cache is moved to the new unfolded one with cold caches - it will spend the next few milliseconds loading the cache with program, data, stack and heap memory lines before it can get back to full speed. If we then find the temporary peak has passed, AIX will fold the CPU again - it was all rather a waste of time and the process moved twice actually got slower. This is why it holds off a little time before unfolding and makes sure it is a genuine growth in the demand for CPU time. It gives me the impression that, for slowly growing workloads, AIX unfolds at something like a virtual processors once a second but that is an observation rather than a fact.<br /></font><b><br /> Why are we monitoring Folding? </b><font color="#000000">Well, I was keen to have this in nmon because it gives us good clues about whether we have the right Entitlement for our virtual machine. If (particularly monitoring long term like over a month), we always find that we have Folded virtual processors and particularly during our peaks in workload then perhaps we have the Entitlement set too high. It could be dropped to let other workloads be added to the same machine. On the other hand, if we have a critical production virtual machines that has for long peak periods Folding = zero then perhaps we should consider raising the Entitlement to make sure this virtual machine always has sufficient CPU cycles.<br /><br /><font color="#0000cd"><b> <font color="#0000ff">Folding is Leaking out!</font></b></font><font color="#0000ff"> </font>This advanced optimisation technique inside AIX was an internal secret at one time but has become known. I suspect my nmon monitoring might have accelerates that a little :-) But there are a few comments in the manual now and there are a few Folding tuning parameters available.<br /><br />The advanced AIX scheduling tuning command, &quot;schedo&quot; has these options:<br /></font></font></span><ul><li><b> AIX6+ vpm_fold_policy</b></li><li><b> AIX6+ vpm_xvcpus</b></li><li><b> AIX6 TL06+ &amp; AIX 7</b> also has <b>vpm_fold_threshold</b> (in the schedo command but not in the documentation yet). This is the utilisation threshold at which AIX begins to unfold. In the past this was 80% and in the latest AIX release been lowered.<br /></li></ul><font color="#ff0000"><b>I would not go fiddling with these unless you have a particular problem to fix and have recommendations fro AIX support. You can easily make performance worse with non-default settings.</b></font> Check the current settings and that try are the default with (as the root user): schedo -L<br /><br />In the manual pages:<br /><ul><li><b> </b><a href="https://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/virtual_proc_mngmnt_part.htm">Virtual Processor discussion in the manuals<br /></a></li></ul><ul><li>mpstat -s 1 9999 is an alternative way to &quot;see&quot; Folded virtual CPUs.<br /></li></ul><span class="st"><font color="#0000ff"><font color="#000000">Read the AIX online manual for more information on these.</font></font></span><br />I hope this helps, thanks, Nigel Griffiths.<br /><b><br /></b><b><br /></b><span class="st"><br /></span></div>
This mysterious AIX CPU Folding area is often misunderstood, so below is what I know from osmosis from talking to various guru level developers over the last 10 years. Shared Processor virtual machines (LPARs for the old fashioned) have a setting called...4435628urn:lsid:ibm.com:blogs:entries-fa852cb9-77c8-440c-8f89-44492192136bAIXpert Blog2015-08-01T04:07:46-04:00