urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer - Tags - smf72 I'm a well-known mainframe performance guy, with almost 30 years of experience helping customers manage systems. I also dabble in lots of other technology. I've sought to widen the Performance role, incorporating aspects of infrastructural architecture.
I'm a world-famous podcaster and screencaster (albeit VERY thinly spread). :-)03072018-05-12T16:41:49-04:00IBM Connections - Blogsurn:lsid:ibm.com:blogs:entry-e82acd82-cd14-4346-aa59-82dc36bb95d5Discovering Report Class / Service Class CorrespondencesMartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2013-05-22T12:34:53-04:002013-05-22T12:34:53-04:00<p>It&#8217;s possible I&#8217;ve written something about this before:
My blog is so extensive now it&#8217;s hard to find out exactly what I&#8217;ve written about (and I&#8217;m going to have to do something about that).</p>
<p>I say &#8220;written <strong>something</strong>&#8221; because I know for sure I haven&#8217;t written about the SMF record field I want to introduce you to now.</p>
<h2>Previously</h2>
<p>If when you send me data you include Type 30 interval records I&#8217;ll use them to relate WLM Service Classes to Report Classes:
Workload, Service Class and Report Class are all in there.</p>
<p>But these records are only for address spaces.
Address spaces that actually got created.
And therein lies a problem:
Only some of the Service Class / Report Class relationships can be gleaned this way.</p>
<p>In practice I&#8217;ve found this (incomplete but not inaccurate) information handy.
So I&#8217;d like to fill in some gaps.</p>
<h2>New News</h2>
<p>I expect you didn&#8217;t know this either - so I call it &#8220;new news&#8221;:
There&#8217;s a handy field in SMF 72 Subtype 3 (Workload Activity Report) called R723PLSC.
It has nothing to do with PSLC.</p>
<p>This is defined as the &#8220;Service Class that <strong>last</strong> contributed to this Report Class period
during this interval.&#8221;
I&#8217;ve highlighted the word &#8220;last&#8221; as its quite important but we&#8217;ll come back to that in a minute.</p>
<p>This allows you to see some relationships for work that isn&#8217;t represented by address spaces, for instance DDF.
(In my test data it&#8217;s DDF I&#8217;m seeing.)</p>
<p>I&#8217;ve spent some time adding this in to my code.
Usually I&#8217;d summarise over several hours.
In this case if I do I miss stuff.</p>
<p>The emphasised &#8220;last&#8221; above means that only one of the (potentially several) Service Classes that correspond to this Report Class shows up in the record.
So I use a set of rows, each representing a short interval, to get the correspondence.
In my test data this approach yields more correspondences - as somehow the last one often isn&#8217;t always the same one from interval to interval.</p>
<p>If you use Report Classes to break out a <strong>subset</strong> of a Service Class the &#8220;last Service Class&#8221; issue doesn&#8217;t arise.
If you use Report Classes for <strong>aggregation</strong> (or in a hybrid way) it certainly does.</p>
<p>(I&#8217;m not all that keen on using Report Classes for aggregation anyway:
Decent reporting tools can do that for you.
But I could be persuaded.
I&#8217;m keener on using them for breakouts, such as DDF applications that share a common Service Class, or to break out an address space or several.)</p>
<p>I&#8217;m not claiming to have got all the Service Class / Report Class correspondences but I&#8217;ve got more of them - and for an important set of cases:
Service Classes and Report Classes that don&#8217;t correspond to address spaces.</p>
<p>As you&#8217;ll see in <a href="https://www.ibm.com/developerworks/community/blogs/MartinPacker/entry/playing_spot_the_difference_with_wlm_service_definitions?lang=en">Playing Spot The Difference With WLM Service Definitions</a> I prefer to have the WLM Service Definition to work with - and I&#8217;ll be asking for it more fervently in the future.
But you have to work with the data you can readily obtain.
And R723PLSC is a handy field to have learnt about.
You might find it useful, too.</p>
It&#8217;s possible I&#8217;ve written something about this before:
My blog is so extensive now it&#8217;s hard to find out exactly what I&#8217;ve written about (and I&#8217;m going to have to do something about that). I say &#8220;written something &#8221;...006930urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00urn:lsid:ibm.com:blogs:entry-363ca7cb-d31d-481c-97fb-5ee0bd8030d9Drawing The LineMartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2012-03-23T10:12:37-04:002012-03-23T10:12:37-04:00<p>You'd think it would be pretty simple to draw a line. Right?</p>
<p>This post discusses an enhancement I'd like to make to my current reporting - and I'm pretty sure that <b>technically</b> I can do it. The question is whether I <b>should</b>.</p>
<p>Consider my current &quot;Memory by address space within Service Class&quot; graph. Here's a sample:</p>
<br />
<img src="https://dw1.s81c.com/developerworks/mydeveloperworks/blogs/MartinPacker/resource/CurrentMemSC.png" width="729" />
<br />
<p>And here's what I think I might like it to look like:</p>
<br />
<img src="https://dw1.s81c.com/developerworks/mydeveloperworks/blogs/MartinPacker/resource/ProposedMemSC.png" width="729" />
<br />
<p>Obviously the line's been drawn on by hand. I haven't written any code to achieve the enhancement. And, yes, the data's real - apart from the drawn-on line. I feel pretty safe (on behalf of the customer) in showing you this as it's VERY generic. But, no, I can't promise the drawn-on line's in the right place.</p>
<p>Let's talk about:</p>
<ul>
<li>Motivation and Usage</li>
<li>Mechanics</li>
</ul>
<br />
<h3>Motivation and Usage</h3>
<br />
<p>When I throw graphs at you I see myself as &quot;story telling&quot;. Hopefully an accurate story, certainly one I believe in. So, when working on my code I ask the question &quot;how does this affect the story telling?&quot;</p>
<p>Here's how I normally tell the (e.g) CPU story:</p>
<ol>
<li>Talk about CPU usage by processor pool by LPAR<sup>1</sup> and stacked up to give the machine view.</li>
<li>Break down CPU usage by WLM Workload and the Service Class<sup>2</sup> - again by pool.</li>
<li>Likewise by address space within a Service Class.</li>
<li>Possibly break down address space CPU to e.g. Transaction - assuming CICS or DB2 are &quot;in play&quot;.</li>
</ol>
<p>When you've done that you certainly know where the CPU is going. You do the same thing for memory - right until you get to Step 4.</p>
<p>The concept of &quot;capture ratio&quot; is well known and bridges the gap between Step 1 and Step 2 - for CPU<sup>3</sup>. It doesn't make sense to draw the proposed line for this case.</p>
<p>To bridge between the Service Class level and the Address Space level (Step 2 to Step 3) I think a different treatment is required. There are a number of reasons for this:</p>
<ul>
<li>Some service classes have no address spaces. And hence no memory. &quot;Capture Ratio&quot; may be 100% but unlikely to be computed that way. <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/smile.gif" class="smiley" alt=":-)" title=":-)" /></li>
<li>The chart I'm proposing has up to 15 address spaces on it. (We could make it more but then it becomes markedly less readable.) So, for a Service Class with more than 15 address spaces we miss some - as in this particular example. I'd like to show we had good (or bad) coverage of the &quot;headline&quot; Service Class number in these 15 address spaces. This works fine for CPU, memory and EXCPs.</li>
<li>Type 30 memory numbers behave badly and it would be nice to see how badly compared to the Service Class total. (Type 30 CPU numbers don't behave badly.)</li>
</ul>
<p>So I think the line that says what the total &quot;should&quot; be is ideal for this. Hence my proposal<sup>4</sup>.</p>
<br />
<h3>Mechanics</h3>
<br />
<p>Today the data is in two tables: A Service Class (Period) table and an Address Space table - both summarised at an interval level<sup>5</sup>. The former comes from RMF SMF 72 Subtype 3. The latter comes from SMF 30 Subtypes 2 and 3. It's always interesting handling two different data sources as if they might <b>magically</b> corroborate each other. How naive. <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/smile.gif" class="smiley" alt=":-)" title=":-)" /></p>
<p>I use standard SLR "PRINT CHART" and similar commands against these tables. Not so long ago I learnt how to drive GDDM graphing direct from REXX. Because I can do other things in the REXX (like adjust address space names to add e.g. "CICS") I might take that route rather than using PRINT CHART. And there are some other cases I would want REXX's sophistication to take care of - like either the 30's or the 72's being missing.</p>
<p>In your case you can probably bring the two together quite neatly. Anyone know if MXG already does this?</p>
<br />
</p><h3>Conclusion</h3>
<br />
<p>So, why am I blogging about this? Two reasons:
</p>
<ul>
<li>Because you might want to try the same depiction idea.</li>
<li>Because I'd like to know if you think this is a good idea.</li>
</ul>
<p>So I'd like your input on this. (Commenting here would be fine or any other way you want.) And maybe next time I crunch your data the story will be told just that little bit better. At least that's the plan. <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/smile.gif" class="smiley" alt=":-)" title=":-)" /></p>
<br />
<p /><hr />
<br />
<p><sup>1</sup> Nowadays those pools are: GCP, ICF, zIIP, zAAP, and IFL.</p>
<p><sup>2</sup> I've not found much value in breaking CPU usage down by Service Class Period.</p>
<p><sup>3</sup> For memory I handle it differently - because there are reported-on memory usages that are outside of the Workload / Service Class hierarchy. And I explicitly calculate an &quot;Other&quot; category - which has <b>never</b> turned out to be negative.</p>
<p><sup>4</sup> Today I'd be showing you two charts and inviting you to do the comparison. I hope my proposal makes this quicker and smoother.</p>
<p><sup>5</sup> This interval may be different in the SMF 30 and 72 records but it's summarised to the same interval in the code. This might be 15 minutes, 30 minutes or (most usually) 1 hour. And that's all summarised at the &quot;shift&quot; level for even broader brush work.</p>
</ul>You'd think it would be pretty simple to draw a line. Right? This post discusses an enhancement I'd like to make to my current reporting - and I'm pretty sure that technically I can do it. The question is whether I should . Consider my current &quot;Memory by...0010869urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00urn:lsid:ibm.com:blogs:entry-3fc78010-cfd0-494e-8abe-9252a9b0945cWould You Like More WLM Information In DB2 Accounting Trace - And How Would You Use It?MartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2012-02-06T11:43:30-05:002012-02-06T11:43:30-05:00<p>I was lucky enough to be in Silicon Valley Lab for DB2 BootCamp last week. There I ran into a DB2 developer I've worked very successfully with in the past - John Tobler.</p>
<p>(He's the guy I look to for questions and issues with DB2 SMF data.)</p>
<p>We had a good discussion about something I'd personally like to see in DB2 Accounting Trace - more WLM information - and this post is as a result of this conversation.</p>
<p>Two salient pieces of information:</p>
<ol>
<li>Accounting Trace already has a field for WLM Service Class (QWACWLME) but it's only filled in for DDF work.</li>
<li>As Willie Favero pointed out in <a href="http://it.toolbox.com/blogs/db2zos/apar-friday-wlm-information-is-now-part-of-the-display-thread-command-48467">APAR Friday: WLM information is now part of the DISPLAY THREAD command</a> the command now has some WLM information in it.</li>
</ol>
<p>Putting these two together you come to the conclusion it might <b>technically</b> be possible to get more WLM information into Accounting Trace. That, of course, doesn't mean it's going to happen. I have to stress that before going any further. But it's worthwhile thinking about what's needed and how useful that would be to customers.</p>
<h3>What Should Be Added?</h3>
<p>Uncontroversially, I think, QWACWLME should be filled in with Service Class for <b>all</b> work types. I say &quot;uncontroversially&quot; because - if it can be done cheaply - it's just using space that's already in the record. I don't know if it can be done cheaply, though.</p>
<p>More controversially because, taken together, they represent 18 additional bytes in <b>each</b> 101 record are:</p>
<ul>
<li>WLM Workload</li>
<li>WLM Report Class</li>
<li>WLM Service Class Period</li>
</ul>
<p>I think I could live without Workload but it seems a shame to exclude it.</p>
<p>As Willie points out Performance Index (PI) is also in the DISPLAY THREAD command but I think we can get that from RMF Workload Activity Report (SMF 72) and that's probably a better place to get it from.</p>
<p>But the key question is "how useful and important would this extra information be to you?"</p>
<p>Let me outline three areas of use I can immediately see...</p>
<h3>Understanding Not Accounted For Time</h3>
<p>This time bucket is what you get when you subtract all the time buckets we know about from the headline response time. The two most important causes for this are CPU Queuing and Paging Delay.</p>
<p>If we calculate this time for a record and we know the (behaviour of) the WLM Service Class it's in we can understand this time better. A bugbear of doing DB2 performance is just this: understanding whether work is subject to queuing or not. (For Paging Delay as a cause of Not Accounted For Time we could do much the same thing.)</p>
<h3>Understanding The WLM Aspects Of DB2 Work</h3>
<p>It would be useful to be able to break down the work coming into a DB2 subsystem by Service Class, Goal and Importance, wouldn't it? In particular it would be nice to see the hierarchy of goals and importances, and to be able to relate the works' WLM attributes to those of address spaces such as DIST and DBM1. (In the former case discovering that the TCB's in the DIST address space were subject to pre-emption by the DDF work would be a blow.)</p>
<h3>Correlating Service Class And Report Class For DDF Work</h3>
<p>For non-enclave work I use the Report Class and Service Class in Type 30 to establish how these relate to each other (and what kind of work has which RC and which SC). I can't do it for DDF work because there's no usable Type 30 (i.e. with this kind of information in). If the 101 record had these both in you could extend the method.</p>
<p>(In case you wonder what I'm talking about see <a href='https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/entry/what_s_in_a_name18?lang=en_us'>What's In A Name?</a>.)</p>
<p>This still <b>doesn't</b> help us in the non-DDF enclave cases, of course.</p>
<h3>Over To You</h3>
<p>What do you think? I've listed three categories of value that immediately spring to mind (and that's with the disbenefit of jetlag so maybe not that articulately expressed). But I'd really like to know if this would be of value to you - and to modify the proposal if you think you'd like something slightly different.</p>
<p>There's no guarantee this will get done - and it's a bit of an attempt at a "Social Requirements Gathering" process. But it's worth debating in public, I think.</p>I was lucky enough to be in Silicon Valley Lab for DB2 BootCamp last week. There I ran into a DB2 developer I've worked very successfully with in the past - John Tobler. (He's the guy I look to for questions and issues with DB2 SMF data.) We had a good...0314034urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00urn:lsid:ibm.com:blogs:entry-18b59a15-b2e4-4526-84a0-26766e71353cWhat's In A Name?MartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2011-12-16T11:44:36-05:002011-12-16T11:44:36-05:00
<div>This is the post I was going to write before the discussion that led to <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/entry/cics_vsam_buffering14?lang=en_us">CICS VSAM Buffering</a> arose. It's about getting more insight into how WLM is set up and performing than RMF Workload Activity Report data alone allows.</div><div> </div><div>I recognise some of this can be done with the WLM policy in hand. But this is about an SMF-based approach. (The piece you can't do with SMF is discerning the WLM classification rules.) And the policy can't answer questions about how systems actually behave.<br /></div><div> </div><div>There are two distinct problems I've worked on solving (relatively) recently. I share the outline of my solution to each of these with you here.<br /> <br /><ul><li>In RMF you can't tell how Report Classes and Service Classes relate to each other: In some cases Report Classes break down Service Class data - often to the address space level. In some cases Report Classes coalesce information from multiple Service Classes. But you can't see this linkage in RMF.<br /> <br /></li><li>In RMF you can't necessarily tell what runs in each Service Class. I say &quot;necessarily&quot; because you <b>can</b> tell some things about the nature of the work in a Service Class.</li></ul> </div><div>The &quot;What's In A Name?&quot; in the title refers to the fact a Workload, Service Class or Report Class <b>name</b> is just a string of characters: Rhetorically it might be a &quot;promise&quot; but it's not a mechanistic guarantee. So - to me at least - it's worth knowing rather more.<br /></div><div> </div><div><u><b>Report And Service Class Relationships</b></u><br /> <br />SMF 72 Subtype 3 RMF Workload Activity Report data describes how Service Class Periods and Report Classes perform.<br /> <br />Type 30 Interval records (Subtypes 2 and 3) describe how <b>address spaces</b> perform.(Actually so do Subtypes 4 and 5, which are step-end and job-end records.) These records contain, amongst other things, WLM Workload, Service Class and Report Class names - for the address space. You can therefore use Type 30 to relate Workload and Service Class to Report Class. My code's done this for some time.<br /></div><div> </div><div>Type 30 does not apply to Service Classes that don't own address spaces. Two examples of this are DDF Transaction Service Classes and CICS Transaction Service Classes.<br /></div><div> </div><div>A related topic is which Service Classes are serving other Service Classes. For example CICS Region Services Classes and transaction Service Classes. Now this you <b>can</b> readily discern from SMF 72 alone. (And of course my code does that.)<br /></div><div> </div><div><b><u>What Work In A Service Class Is</u></b><br /></div><div> <br /></div><div>(This piece relates equally to Report Classes.)<br /></div><div> </div><div>As I said, you can't tell much about what a WLM Service Class covers from Type 72. So, as well as the correlation described above, my code uses Type 30 to flesh out what a Service Class is for. The key to this is the Program Name. For example CICS regions have PGM=DFHSIP. So a Service Class with just PGM=DFHSIP address spaces is just a CICS Region Service Class. Simple enough. Some are more complicated than others - perhaps necessitating the 16-character program name field which, for Unix, includes the last portion of the Unix program name.</div><div> </div><div>You can play other games, too: The job name for a DB2 address space can be decoded to glean the subsystem it belongs to. Certain System address spaces have mnemonic Procedure names. And so on. </div><div> </div><div>From SMF 72 you can obtain the number of address spaces for a Service Class - 0 suggesting the Service Class doesn't own any (see above). 1 suggests this class (possibly a Report Class) is there to provide more granularity. You can also get the number of address spaces In and the number Out-And-Ready. This can help you form a picture of e.g. &quot;low use&quot; address spaces in the Service Class.<br /> <br /> </div><div> </div><div>This post is about sharing some of my experience of trying to extend the value that can be got out of SMF - beyond the obvious. Some of this will probably appear in my <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/entry/i_know_what_you_did_last_summer7">I Know What You Did Last Summer</a> presentation - which I'm <b>still</b> hoping to complete soon. This also, by the way, explains why I'm so keen to get Type 30 data from you when you're sending me RMF data. There really is a huge amount of value to be had.<br /></div>
This is the post I was going to write before the discussion that led to CICS VSAM Buffering arose. It's about getting more insight into how WLM is set up and performing than RMF Workload Activity Report data alone allows. I recognise some of this can be done...007881urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00urn:lsid:ibm.com:blogs:entry-65b0d187-a990-48ed-95eb-ae6645e83cbazAAP and zIIP DelayMartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2011-10-01T12:11:35-04:002011-10-01T12:11:35-04:00
<div>I was going to start this post with an apology. But, as any sensible blogger would, I left it a few days to write this. Now I realise that there's a wider point than the &quot;I was wrong&quot; one. (But I <b>was</b> wrong - in a way that I think many other people might've been wrong too.)</div><div> </div><div>So let me talk about two things in this post:<br /> <br /><ul><li>zAAP and zIIP Delay.</li><li>How I came to be wrong and what we can all learn from it.</li></ul></div><div><hr size="2" width="100%" /><b> <br /></b><u><b>zAAP and zIIP Delay</b></u></div><div> <br />The fields we're talking about here appear on the RMF Workload Activity Report and are - in the Type 72-3 record<b> </b>- R723IFAD (for zAAP) and R723SUPD (for zIIP). Personally I use the latter as shown in <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/entry/zaap_and_ziip_revisited_at_last98?lang=en_us">zAAP and zIIP Revisited (At Last)</a>. </div><div> </div><div>They are described as &quot;zAAP delay samples&quot; and &quot;zAAP using samples&quot;.<br /> <br />(Hereafter I'm going to drop the &quot;or zIIP&quot; bit. And, by the way, zAAP-on-zIIP doesn't affect the discussion significantly.)</div><div> <br /></div><div>But what does R723IFAD mean? I had assumed (perhaps mentally fuzzily) that it meant &quot;delay samples because zAAP was unavailable&quot; (likewise zIIP). So my recommendation would have been that the way around it was to provision more specialty engine capacity to the service class period. </div><div> </div><div>It turns out that's not the right interpretation of the field. Here's a better one:<br /> <br />For a delay sample to be declared for a service class period for zAAP all of the following criteria have to be met:<br /></div><div> </div><div><ul><li>We were trying to run zAAP-eligible code. (I think we knew this.)</li><li>No zAAP could run the work. (My basic assumption.)</li><li>No general-purpose engine (GCP, in my parlance) ran the work. (<span style="font-weight: bold;">This</span> is the new bit.)</li></ul> <br />So, seeing significant samples in the &quot;Delay for zAAP&quot; field means we not only didn't get to run on a zAAP but we also didn't run on a GCP. And the implication here is that understanding all this requires us to weave in the GCP view. We could be short on <span style="font-weight: bold;">both</span> zAAP and GCP capacity.<br /></div><div> </div><div>Now, I would guess the &quot;take home&quot; is <span style="font-weight: bold;">still</span> provision more zAAP capacity to the service class period - <span style="font-weight: bold;">if you want to increase its velocity and Delay for zAAP is the major issue</span>. (There are some rare cases where that mightn't be right - given processors come in integer numbers.) But the reasoning is a little different: You'd rather run zAAP-eligible work on a zAAP than on a GCP, I would think.<br /></div><div> </div><div>For completeness, the &quot;CPU Delay&quot; field (R723CCDE) is for non-zAAP-eligible work and &quot;CPU Capping Delay&quot; (R723CCCA) is also for non-zAAP-eligible work. (Helpfully the SMF manual states that R723CCCA is <span style="font-weight: bold;">not</span> a subset of R723CCDE.) If R723CCDE / R723CCDE come into play, then, it's about provisioning GCP capacity - or, distantly, finding a way to make more of the work eligible for zAAP.<br /></div><div> </div><div><hr style="width: 100%; height: 2px;" /> <br /><span style="text-decoration: underline;"><span style="font-weight: bold;">The Wider Lesson</span></span><br /> <br />We're all used to reading numbers off reports and taking them at &quot;face value&quot;. If some metric is called &quot;splodgeness&quot; and the value is high we say &quot;<a href="http://en.wikipedia.org/wiki/Splodgenessabounds">splodgeness abounds</a>&quot; <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/smile.gif" class="smiley" alt=":-)" title=":-)" /> without necessarily giving it too much thought. But what is this &quot;splodgeness&quot; whereof we speak?</div><div> </div><div>Often all we get is the description &quot;zAAP delay samples.&quot; (If you think we get more then do please look at the SMF manual's description for R723IFAD.) So we tend to:<br /> <br /><ul><li>Cling to the certainty the existence of a particular metric gives us. I think we're grateful to have the metric. After all, consider the counterproposition.<br /></li><li>Invent for ourselves an interpretation of what the metric means. I say, perhaps rudely, &quot;invent&quot; because who's to say if we have the right interpretation? We have to gain a foothold somehow. So actually I'm entirely sympathetic.<br /></li></ul> <br />So, in response to a customer question I set off in search of an answer to the question &quot;what does R723IFAD mean?&quot; I have a friend in RMF who mentioned they got the number from WLM and suggested I ask a mutual friend in WLM. He, very usefully, pointed me at Dan Rosa in Systems Software Development in Poughkeepsie. Dan and I chatted for well over an hour and he helped form a very good understanding of what this field means. So many thanks to Dan! </div><div> </div><div>Now, I count myself as <span style="font-weight: bold;">very</span> lucky in having friends in such useful places. I realise that's a privilege. And I don't tend to bombard them with questions about each and every SMF record's field.</div><div> </div><div>So, I think there are lots of fields like that. I've stumbled across a fair few. It would be nice to revive the old RMF Field Description manual (last updated in the early 1990's). I don't think that's going to happen, unfortunately. And it would take forever to bring it up to date.<br /> <br />But I do think it's legitimate to gain an understanding of where a field came from, why it was invented, and how it behaves. And that's what I try to do - for fields I think tell a useful story. And that's part of why I actually <span style="font-weight: bold;">like</span> questions about fields, and part of why I feel like a &quot;kid at Christmas&quot; whenever new data arrives: It gives me a chance to see how this stuff behaves and how y'all're using our hardware and software.</div><div> </div><div><hr style="width: 100%; height: 2px;" /> <br /> </div><div>So, in conclusion, we learn and grow together. But there's always room for better understanding. I guess I knew all this. Tacitly, I expect a lot of you will share my experience.<br /></div>
I was going to start this post with an apology. But, as any sensible blogger would, I left it a few days to write this. Now I realise that there's a wider point than the &quot;I was wrong&quot; one. (But I was wrong - in a way that I think many other people...039010urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00urn:lsid:ibm.com:blogs:entry-e65fa8be-59ab-4c23-b4c0-9904ce242da7Batch Capacity Planning, Part 2 - MemoryMartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2011-09-25T12:45:01-04:002011-09-25T12:45:01-04:00
<div>I can't believe it's been almost a week since I wrote <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/entry/batch_capacity_planning_part_1_cpu11?lang=en_us">Batch Capacity Planning, Part 1 - CPU</a> . Where did the time go? <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/smile.gif" class="smiley" alt=":-)" title=":-)" /></div><div> </div><div>Re-reading it I'm struck by the overwhelming theme of Batch's unpredictability and lumpiness. This is true of memory, as well, but to a much lesser degree.<br /> <br />Why to a lesser degree? Well, in most systems I look at the memory usage is mostly fairly constant and dominated by big &quot;server&quot; address spaces (such as DB2) or else CICS regions (and the analogue in the IMS world). So the variability of the Batch workload is overshadowed by the constancy of these other address spaces.</div><div> </div><div>I oversimplified there just a tad. for example:<br /> <br /><ul><li>A few customers alter their DB2 buffer pools - using the, unsurprising, ALTER BUFFERPOOL command. (They increase the size and sometimes change the various thresholds, particularly if they perceive Sequential processing to dominate overnight.)</li><li>Many customers shut down some, if not all, of their CICS regions overnight. Decreasingly so, though. Likewise IMS MPRs.</li></ul></div><div> <br /></div><div>But these changes are step changes and conscious acts of will. <b>Do</b> take them into account, but don't worry about their variability.<br /> <br />But there's an issue anyway with Batch memory analysis:</div><div> </div><div><ul><li>At the System level RMF reports good numbers: The various frame queues and Available Frames are all accurate in SMF Type 71<br /></li><li>At the Service Class level (SMF Type 72) the numbers are less reliable: We use R723CPRS (Resident Page Frame Seconds) to figure out how much memory a service class (period) is using. This works fine for Non-Swappable work (such as CICS and DB2 and System). It doesn't work well for Swappable work (such as TSO and Batch): When you're logically swapped you're said not to accumulate (memory) service. So Swappable work is under-counted.</li></ul> </div><div>This has been such a nuisance that my standard Memory graph has to take it into account: I take the Online frames and subtract all the non-workload frame queues - LPA, CSA, SQA and the like - and the Available Frames. These all come from SMF 71. What's left ought to be workload-related memory usage. So I then subtract all the memory usage (R723CPRS-driven) for all the workloads - including Batch and TSO. What's left I call &quot;Other&quot;.</div><div> </div><div>For the most part Other is the under-representation of Swappable workloads - Batch and, distantly, TSO. But I sometimes see a veneer of constancy - which could be a miscalculation or another category of memory entirely. But at least I can add &quot;Other&quot; into the Batch workload to get a reasonable estimate.<br /> <br />What I do note is that, generally, there's lots of memory in the Available Frames category i.e. Free - during the Batch Window. So it's almost always true that - from the Memory perspective - Data In Memory (DIM) or Parallelism could be increased.</div><div> <br /></div><div>I find the SMF Type 30 data to be almost entirely useless for memory usage purposes - especially for Batch. About the only thing you can do with it is to use the Virtual Memory fields and pretend that each virtual page allocated is backed by real memory. Which we all know to be a (usually) gross overestimate.So I don't actually do that.</div><div> <br /></div><div>For Batch, though, the biggest user of memory is typically DFSORT. Now there we have <b>good</b> news: The instrumentation does a nice job of summarising peak memory exploitation: Whether Dataspace, Hiperspace or Large Memory Object sorting. Note: I say &quot;Peak&quot; here. You might be able to do something to turn that into an average. But that would be a little fraught.</div><div> </div><div>All the above has been about actual usage. What about projecting forwards? If you know nothing's going to change the picture is likely to be static. If you think you're going to exploit DIM or Parallelism it's much more difficult, for the same reasons as it was for CPU. But there's an additional reason:<br /> <br />If you were to be very successful at buffering something - especially with only a small number of buffers - then you hold onto the memory for less time. By exploiting DIM well the &quot;area under the curve&quot; could actually go <b>down</b>. Such a case might be using VSAM LSR buffering. I've seen the kinds of speed up where this is likely: 2 million I/Os to under a thousand, for example.</div><div> <br /></div><div>So, in summary, Memory Capacity Planning for Batch shares the difficulties of CPU. But it also has a few of its own.<br /></div><div>
</div>I can't believe it's been almost a week since I wrote Batch Capacity Planning, Part 1 - CPU . Where did the time go? Re-reading it I'm struck by the overwhelming theme of Batch's unpredictability and lumpiness. This is true of memory, as well, but to a much...0010164urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00urn:lsid:ibm.com:blogs:entry-6c380d65-72a7-4a6e-85f5-b4e3d4539e27zAAP and zIIP Revisited (At Last)MartinPacker11000094DHactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2011-09-14T11:41:00-04:002011-09-14T11:41:00-04:00<div>It's been an awfully long time since I wrote <a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/entry/when_good_work_doesn_t">When Good Work Doesn't Go To zIIP / zAAP Heaven</a>. And too long since I posted anything at all. <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/sad.gif" class="smiley" alt=":-(" title=":-(" /> In fact I had a bunch of posts in my brain until this morning when a customer asked me a question which turned into <b>this</b> blog post. (Those posts are still in my brain and will probably see the light of day eventually, including one based on a customer question.)<br /></div><div> </div><div>They wanted to know whether there was an In-And-Ready count for zIIP or zAAP. I don't see such a thing. But what I <b>do</b> see is, in my opinion, much better. I've also presented a chart in my &quot;Much Ado About CPU&quot; presentation on the subject for some time now. I'm surprised I haven't blogged about it already. So here it is, while it's still useful: </div><div> </div><div>Take a look at this graph:<br /><br /><a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/resource/ziipSCCPU.png" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker/resource/ziipSCCPU.png" style=" width:100%; display:block; margin: 0 auto;text-align: center; position:relative;" /></a> </div><div>It's for a single WLM Service Class period across a 24 hour period. On the vertical axis we have a couple of mixed types:</div><div><ul><li>A stack of the CPU components for the service class period, as a percent of an engine. These are the bars.<br /></li><li>A yellow line representing the percentage of WLM samples that represent zIIP delays.</li></ul>In this example it's a relatively benign case. Here's how I read it:</div><div><ul><li>The red bar values are &quot;GCP on GCP&quot; - the work that was never eligible for zIIP. It's normal for it to run this way.<br /></li><li>The blue bar values are &quot;zIIP on zIIP&quot; - the work that was eligible for zIIP that actually ran on zIIPs. This is goodness.<br /></li><li>The green bar values are &quot;zIIP on GCP&quot; - the work that while eligible for zIIP ran on GCPs. This is what you'd like to minimise.</li><li>Because almost half ran on GCP I conclude this is DDF work. (In fact it running on a zIIP before zAAP-on-zIIP and the name of the service class confirm this. The regularity of the pattern and split also corroborate it.)</li><li>At times when the delay samples tick up so does the &quot;zIIP on GCP&quot;. <span style="font-weight: bold;">This is the key correlation</span>.</li><li>In fact the &quot;zIIP on GCP&quot; portion isn't all that bad. I've seen worse and I've seen a little better.</li></ul>This is a standard part of my reporting. Hence I'm in a position to say &quot;I've seen worse and I've seen a little better.&quot; I have <span style="font-weight: bold;">experience</span> these days. <img src="https://www.ibm.com/developerworks/community/blogs/images/smileys/smile.gif" class="smiley" alt=":-)" title=":-)" /></div><div> </div><div>Some observations:<br /><ul><li>It's probably fair to use normalised CPU - particularly as there are many machines where the specialty engines run at full speed and the general purpose ones don't.</li><li>It's probably a good idea to add in the &quot;Delay for GCP&quot; sample percentage. I was sensitised to this by a customer both of whose pools were - for the service class period in question - showing serious delay samples.</li><li>In general this chart has both zIIP and zAAP on it for the same service class period. The technique works for zAAPs, zIIPs and zIIP on zAAP just the same.</li><li>Talking of zAAP on zIIP: I expect the numbers to all look like zIIP: There are no specific zAAP on zIIP metrics.<br /></li><li>This technique allows you to evaluate things like IFAHONORPRIORITY and IFACROSSOVER as they work out at the service class period level.</li><li>The PROJECTCPU mechanism only works for workloads that are already running. For example, turning on IPSEC is a fresh workload: It won't show up until you run it (whether on a GCP or a specialty engine).</li><li>If an exploiter changes how it behaves (for example DB2 DDF with PM12256) you'll see <span style="font-weight: bold;">some</span> clues in this chart. I say &quot;some&quot; because in that particular APAR the variability in outcome at the thread instance level is <span style="font-weight: bold;">not</span> going to show up here. It might show up in DB2 Accounting Trace (SMF 101) if you go looking for variability at the individual SMF record level.<br /></li></ul>I think the graph works well (even if the colours etc don't) and I think it's a chart you can replicate and build on (including the customer who asked the original question).<br /></div><div> </div><div>One final (meta) point: I hope that if you ask me a question that's of wider interest you won't mind me posting the answer (perhaps extended) as a blog post like this. Of course I'll shoot you the link. And, as you've seen in the past, of course I'll avoid posting your data - unless you OK me doing so.<br /></div>It's been an awfully long time since I wrote When Good Work Doesn't Go To zIIP / zAAP Heaven . And too long since I posted anything at all. In fact I had a bunch of posts in my brain until this morning when a customer asked me a question which turned into this...0612744urn:lsid:ibm.com:blogs:entries-84a1e68f-fa2b-4d83-a240-5f565e7d6248Mainframe Performance Topics with Martin Packer2018-05-12T16:41:49-04:00