Monday, November 4

Monday, November 4, 08:00 - 11:30

111 (Workshops): Top Performance Metrics for Capacity Management of Virtualization

Jie Lu

Room: Aventine A/B

This workshop will give a tutorial on the most important performance metrics for capacity management of virtual servers. For each of the metrics, we will discuss the semantics of the metric; how it is measured; why it is important for capacity management; how does it reveal the performance issues; and possible ways of fixing the problems. By classifying the platforms and metrics, the workshop will offer the best practices on performance and capacity study of the various virtualization platforms. It will cover most of the popular virtualization platforms, including VMware vSphere, IBM PowerVM, AIX WPAR, HP partitions, HP Integrity VM, HP-UX Containers, Oracle Dynamic Domains, Oracles VM Server, Solaris Containers, Microsoft Hyper-V, Xen, and Linux KVM.

112 (Workshops): What Every Performance Engineer Should Know: PEBoK and CMG101

Peter HJ Van Eijk, Digital Infrastructures

Room: Aventine C

In this intermediate level workshop we will share our proposal for a PEBoK: Performance Engineering Body of Knowledge. It encompasses a common set of concepts and theory (as currently being elaborated in MeasureIT under the name ‘performance engineering cookbook') as well as a comprehensive set of roles and skill sets.

We'll cover the concepts, the skill sets, the roles, and the potential certification structure and ongoing maintenance.

Audience participation is encouraged.

113 (Workshops): Eliminate the Blame Game Between z/OS, CICS and DB2

Ivan Gelb, GelbIS

Room: Aventine D

Many performance, capacity and availability problems linger unresolved due to the complexity of the z/OS, CICS, and DB2 environments and the resulting blame game, also known as finger pointing, among the different subsystems. This workshop presents time tested and proven analysis techniques of performance, capacity and availability which unfailingly eliminate the blame game.

451 Advisors senior planning experts and thought leaders will discuss blind-spots, challenges and successful methodologies for creating a Digital Infrastructure Roadmap. What capacities are important to measure? How should third-party services like Cloud and Colocation play into your strategic planning? What are the next steps beyond the first round of server virtualization? This session will help your organization conduct capacity planning that integrates business, technology and the associated data center facilities infrastructure requirements.

Cloud Native applications are focused on speed and automation to support continuous delivery of new features as many small steps rather than a large bundled release. In a world where resources are ephemeral, bought by the minute or hour or Gigabyte rather than purchased up front and depreciated over three years, there are new concepts, tools and optimization techniques to ensure that there is always enough capacity to run the work, maintain high availability, and enable high delivery velocity, but the cost of that capacity is minimized. There are also new challenges for performance monitoring tools, to correctly clean up after ephemeral resources are finished with.

In this workshop Adrian Cockcroft will describe the patterns and tools that have been developed at Netflix and elsewhere to support the Cloud Native platform collectively known as NetflixOSS. Most of the Netflix tools are available as open source downloads from http://github.com/netflix and have also been integrated with other open source tools such as Graphite and Riemann.

Benchmarking is extremely effective in cloud based systems because extremely large configurations can be created quickly, tested then removed for a relatively small cost. The results of cloud benchmarks have to be interpreted carefully, to understand how the underlying infrastructure platform variance affects the results. Several benchmarks will be explored in detail.

Resource usage costs for AWS can be visualized with tools from several vendors, or the Netflix Ice project. A six step program of techniques for cost optimization of applications on AWS in particular, (but which applies to most other cloud vendors) has been developed and will be explained in detail.

In past years Adrian has presented workshops at CMG including Capacity Planning with Free Tools, and TCP/IP Tuning.

Monday, November 4, 12:00 - 13:00

Workshops: Lunch for Workshop Attendees

Located in Pavilion {glass building across patio}

Monday, November 4, 13:15 - 16:45

141 (Workshops): Cloud Database Performance Evaluation

Charles Levine, Microsoft

Room: Aventine A/B

How can we evaluate the performance of databases running in the cloud? As an industry we have a lot of experience evaluating database performance. Is database running in cloud really any different? If yes, how and why? These questions (and their answers) frame the content of this workshop. In the workshop, we will present a model for measuring cloud database performance. We will show the schema, database generator, workload description, benchmark driver, and results. The workshop will be focused on demos and walkthroughs, showing how the workload is constructed, how to run it, and how to analyze and interpret the results.

142 (Workshops): Cloud Risk Management

Peter HJ Van Eijk, Digital Infrastructures

Room: Aventine C

Performance and capacity is one of the risks you have to deal with as an IT service provider or cloud provider. In this half day workshop, certified master cloud trainer Dr. Peter HJ van Eijk will give you a crash course in cloud computing governance, risk management and compliance (GRC).

Performance engineers and capacity planners benefit from being able to frame their work as contributing to risk management and GRC. You might even become a link between technology and risks managers.

The workshop will cover essential bits of cloud computing, methods of identifying risk and value, risk analysis and design of controls (which is risk manager's slang for stuff like monitoring and proper planning), and managing compliance and its maturity.
Workshop attendants will receive a set of templates and other goodies to use in their work.

143 (Workshops): WLM - Performing a Cursory WLM Review

Peter Enrico, EPStrategies

Room: Aventine D

During this workshop Peter Enrico will walk the attendees through a cursory review of their WLM Goals and Importance Levels. In particular, during this workshop Peter instruct the attendees to do the following reviews:
• Importance Level Review
• Velocity Goal Review
• Response Time Goal Review
• Multiple Period Review
CMG attendees planning on attending this workshop are strongly encouraged to contact Peter Enrico at least a week prior to the workshop if they want to submit to him raw SMF data. For attendees that submit raw SMF data, Peter will generate a set of performance reports needed to do the WLM analysis reviews discussed in the workshop. If you want to submit Peter raw SMF data for this workshop, please send him an email at Peter.Enrico@EPStrategies.com

An intensive workshop for performance management professionals who would like to learn how to apply machine learning decision trees and analytical queueing network models to capacity planning, performance management and workload management of Big Data Warehouses based on Oracle and Teradata and Hadoop Clusters

An Application Profile is a description of application behavior, performance, and resource consumption. As an example, a Profile can be developed using load test results to quantify the resource requirements and response time components of individual business functions. In a production environment, a Profile can describe the mix of transaction types that are processed by an application. Both examples illustrate the quantitative focus of an Application Profile; a precise description of the workloads processed by an application, their performance characteristics and resource usage.

An Application Profile is a prerequisite for application performance analysis and capacity planning.

• Application-level performance analysis is generally focused on improving the performance or capacity of an application. Creating an Application Profile should be the first step; determining what an application is doing, where it is spending its time and quantifying its resource consumption. The Profile enables the performance analyst to focus his/her efforts.

• Data center capacity planning is most effective when it is developed from a set of application-level capacity plans. The application-level capacity plans are simply another form of an Application Profile that relates workload volume to resource requirements. If the capacity planner has a complete set of Application Profiles, then the task of developing a data center capacity plan is simplified by leveraging the set of Application Profile building blocks.

This workshop will explore the concept, development, presentation and utility of an Application Profile. The following topics will be addressed:
• Terminology for decomposing an application into components suitable for profiling
• Different Profile types based on the profiling goals and available data sources
• Techniques for Application Profile development
• Methods for Profile development across the application development lifecycle; design, test and production
• Sample Application Profiles

Monday, November 4, 17:00 - 17:30

CONF: First Time Attendees Introduction to CMG

Room: Aventine C

Chair: General Chair (Computer Measurement Group, USA)

Monday, November 4, 17:30 - 18:30

AVENTINE A/B: (CONF): Annual Business Meeting and Awards Presentation

All attendees are invited!

Monday, November 4, 18:30 - 19:30

CONF: Welcome Reception

Tuesday, November 5

Tuesday, November 5, 07:00 - 08:00

CONF: Breakfast

Tuesday, November 5, 08:00 - 09:15

AVENTINE A/B 201 (Featured Speaker): Now Playing on Netflix - Adventures in a Cloudy Future

KEYNOTE SESSION

Adrian Cockroft, Netflix (Michelson Award Winner - 2007)

Tuesday, November 5, 09:15 - 09:35

CONF: Break

Tuesday, November 5, 09:35 - 10:35

211 (Featured Speaker): The Challenges of Measuring Database Performance in the Cloud

Charles Levine, Microsoft

Room: Aventine A/B

Cloud computing is growing rapidly and gaining mainstream adoption. While there are well established benchmarks that address the conventional hardware and software market, equivalent benchmarks don't exist for the cloud design point. Performance evaluation for cloud computing is an active field with many people working to develop solutions. The gap between box and cloud is wider in the more specialized and demanding area of databases.
The need for cloud benchmarks is broader than just performance. SLAs (Service Level Agreements) are becoming an important element of the cloud across both high availability and disaster recovery scenarios. Another dimension is predictability, i.e., the ability of the cloud to provide the same performance regardless of time of day and multi-tenancy. Also, the ability of the cloud to respond as the application's resource needs change (elasticity) is important for both users and vendors. Finally, a primary goal of cloud computing is to reduce an organization's costs by shifting to a consumption model of pricing. A benchmark that provides a standard way to measure and compare SLA guarantees, predictability, elasticity, and cost will benefit engineering, marketing, and customers.

Since the birth of external storage arrays there has always been the question: How to allocate storage on the front-end ports guided by performance and capacity? Historically this was an intelligent guessing game with hours of discussion and crossing fingers in the end. This paper discusses a proven method of combining observed performance demand metrics and unit capacity emulsified into a single metric. The result is a front-end port ranking from best to worst performance/capacity criteria. This metric is currently used to place new server onto available SAN storage.

213 (ITSM): PANEL - System z Performance, Capacity & TCO Q&A

Ivan Gelb

Room: Aventine D

214 (CP): Transforming Time Series Data into Capacity Planning Information

Often an analyst has time series data available from performance monitors and needs to make statistical sense of it for capacity planning purposes. For example, a twenty-four hour column chart produced by averaging multiple days of time interval samples yields a statistically stable view of a resource's usage characteristics across the day and clearly identifies its busy period. Since monitoring tools often provide little support for this type of analysis, what can analysts do on their own to accomplish the needed data transformation? This paper describes the statistical manipulations to perform and suggests approaches to the task using "home grown" methods.

Presenter bio: Jim has worked 40 years in the telecommunications and computer industries for GTE, Tandem Computers, Siemens, and currently is the Capacity Planner for the State Of Nevada. At GTE he worked in both Data Center Capacity Planning and Digital Switching Traffic Capacity determination. While at Siemens he obtained EU and US patents for a traffic overload control mechanism used in multiple products including a VoIP Switch. He holds BS and MS degrees in Operations Research from The Ohio State University.

215 (ITSM): The Capacity Manager as an IT Leader

Many organizations have reduced staff in recent years. Some of these organizations, unaware of the purpose or necessity of Capacity Management, have eliminated Capacity Manager positions and even entire teams. This has happened even in organizations that claim to have aligned to ITIL or some other Service Management practice where Capacity Management is described as a key function or process.
In the author's personal experience, Capacity Management positions and teams at companies he used to work for (or was associated with) have disappeared or greatly shrunk over the last decade. Why?
The paper examines the causes. A key cause is a distinct lack of leadership coming from the Capacity Management function. It's not good practice these days to try to avoid being noticed within an organization and hope that the reduction or elimination axe will fall somewhere else in the organization.
So, are you, the Capacity Manager, a leader in your organization? The paper will look at the qualities of strong leaders, will examine how a Capacity Management function or team in an organization can align with or develop its own leaders, and how we can all more forward in this challenging career environment.

Presenter bio: Rich has been working in Capacity Management for 20 years, the last 13 with Metron, holding ITIL v2 Manager and v3 Expert certification. He's worked in a variety of presales and postsales consulting roles within Metron, turning his attention in recent years to product and strategic marketing.
Rich earned a BS in Mathematics from Juniata College (PA) and an MBA from the University of Wisconsin-Whitewater and is responsible for Metron's global marketing efforts.

Tuesday, November 5, 10:35 - 10:45

CONF: Break

Tuesday, November 5, 10:45 - 11:45

In 1989 SPEC benchmarks measured "raw horsepower" of the CPU. Such metrics are still vitally important, but over the years SPEC has also had to address other performance measures of concern to data center managers. Learn how SPEC is quantifying performance in network storage, Java, virtualization, cloud computing, energy efficiency, and more.

222 (ITSM): Considerations in Setting Response Time SLAs

The author explores business, technical, and human factors considerations when deciding on Response Time SLAs. Based on several years' experience in documenting, negotiating, and setting SLAs within several organizations, as well as review of recent research on end-user satisfaction, e-commerce sales conversion and site abandonment rates, the author lays out a strategy for collecting relevant baseline data, setting the stage for negotiations, and finally getting to agreement among all parties on acceptable SLAs. The author emphasize recent studies on e-Commerce productivity (Web site Abandonment rates and Site sales conversion rates) to motivate hard-dollar foundations for SLAs.

Presenter bio: Mr. Halbig is Team Lead of Distributed Systems Performance at First Data Corporation. In this position, Dave and his team provide 3rd level support world-wide for performance and availability issues in any of the thousands of servers and hundreds of applications that make up First Data. To provide a sense of scale, First Data processed over 60 billion billable transactions in 2010.

223 (HOT): Performance Evaluation of Big Data Transmission Models

Transferring massive data sets across shared communication links are non-trivial tasks that require significant resources and coordination. There are currently two main approaches for servicing these types of big data transmissions: end-to-end and store-and-forward. In end-to-end, one or more data streams are opened between the sender and receiver, and data are transmitted directly over these links. In store-and-forward, data are transmitted from the sender to one or more intermediate nodes, before being forwarded to the receiver. This paper explains these main approaches and identifies the key input parameters for big data transmissions. The paper also provides methods for calculating performance bounds for both end-to-end and store-and-forward approaches. The bounding calculation computes the shortest time by which big data can be transmitted between two locations that may potentially span the globe, which introduces additional complexity. Current research in this area focuses on selecting the optimal routing paths and ideal storage node locations. Since these optimizations are complex and expensive, the computationally cheap bounding techniques presented in this paper can be used to quickly obtain performance estimates.

Presenter bio: Adam H. Villa, Ph.D, is an assistant professor in the Department of Mathematics and Computer Science at Providence College. He received his Ph.D from the University of New Hampshire in 2012 and his BA from Wheaton College (Norton, MA) in 2003.

224 (PET): Waiting for a Black Box

In a previous paper it was shown that when working with graphical depictions of system delays, a generic approach ("French performance curve") may suffice in place of the detailed mathematics required for a true model of the system. This paper takes the proposed generic modeling approach a step further by examining some aspects of system operation, including memory requirements and component utilizations. To build the needed model, a black box is used to hide most details of the system. We find that the French performance curve can be used to describe overall system behavior.

Presenter bio: Bruce McNutt, CMG's 2009 Michelson Award recipient, is a senior scientist/engineer and master inventor working in the Systems and Technology Group of International Business Machines Corporation. He has specialized in disk storage performance since joining IBM in 1983 and has published one of the key books on that subject. Among the many papers which he has presented to the annual conference of the Computer Measurement Group, as an active participant for more than 25 years, are three that received CMG "best paper" awards.

225 (CP): Data, Data, Everywhere and Not a Bit to Use

With the arrival of the cloud, and business focus on service based reporting, capturing the correct data has never been more important. The key prerequisite for effective capacity management is to have quality data available for the analyst or planner to use. This session will discuss the challenges of capturing the sorts of data required and what type of data needs to be captured to meet the demands of the business.

Presenter bio: I first started working in the capacity management field in 2000. Initially I was involved with a product for Unisys Mainframes, and through various roles as both a user and vendor of capacity planning and management software, I have spread my experience over everything from AS400 to VMware.

Modeling and sizing techniques for capacity/performance analysis have existed for over three decades. How can these be applied to analyzing the latest virtualization strategies such as VMware's vSphere? This paper demonstrates analyzing capacity metrics from each layer of the virtualization, i.e. vSphere cluster, host, VM, and from the VM operating system itself (Linux or Windows), for Memory and CPU resources. Similarities and dissimilarities between analyzing a physical and virtual server are discussed. This is a specific application of the general approach presented in "Modeling/Sizing Techniques for Different Virtualization Strategies" from CMG 2008.

Presenter bio: Debbie Sheetz is a Sr. Staff Consultant based in BMC Customer Support, at the Lexington, Massachusetts/USA location. She provides applied solutions for performance analysis and capacity planning challenges for customers, business partners, and BMC field consultants. She works with product engineering and marketing on refining existing solutions and designing new solutions. Prior to working with Distributed Systems performance management products, she had extensive involvement with AS/400 and mainframe product support and development. Originally hired to work on the first version of BEST/1 at BGS Systems, she has 37 years experience developing and supporting capacity and performance analysis software with BMC Software/BGS Systems.
•Previous Presentations
CMG 2006, 2007, 2008, 2009, 2010, 2012
US regions: DC, Chicago, New York, Connecticut, St. Louis,
Midwest, Southern, Boston, Southern California
International regions:
UK CMG 2007, 2009, 2010, 2011
CMG Canada (Toronto

This paper presents a new technique to evaluate performance and automatically resolve anomalies in Java-based enterprise applications running on virtualized environments in the cloud. New business metrics for measuring and monitoring the user activity and performance data are introduced in addition to the system ones. The results of capacity analysis are shown through real-world examples.

Presenter bio: Serg Mescheryakov is a Doctor of Science, Professor of St. Petersburg Polytechnic University, Russia, specializing in computer science and engineering. He published more than 100 scientific papers and successfully implemented a series of enterprise-class database systems. Dr. Mescheryakov has 35 years total experience in IT as a teacher, developer, database architect, team leader, customer support, analyst, including 10 years at global IT companies in Silicon Valley, CA, USA (Enkata Technologies, RingCentral Inc., Genesys Telecommunications Laboratories).

235 (PET): Methodical Benchmarking of NoSQL Database Systems

The NoSQL databases offer high performance and scalability at lower prices compared to the traditional RDBMS and have been commercially accepted and widely deployed. They also offer flexible schemas to support unstructured data. Due to their distributed implementation and the novelty of their target applications, the NoSQL databases pose significant engineering challenges to the application developers and database administrators who have been accustomed to the engineering of traditional centralized SQL based RDBMS systems. The lack of experience in engineering these systems results into major production issues. Based on our experience with performance engineering of several NoSQL databases in the production environment, we came up with a methodical benchmarking methodology to characterize the performance of NoSQL databases. This methodology is generic in nature and is applicable to most of the NoSQL Databases. We applied this methodology to characterize the performance of MongoDB. Based on our testing, we provide the best practices in engineering NoSQL databases for business critical applications.

Tuesday, November 5, 14:15 - 14:25

CONF: Break

Tuesday, November 5, 14:25 - 15:25

The importance of connection speed has been long lost in the discipline of performance engineering. For over a decade this once critical component has been collecting dust. Residential and commercial internet connections had become so fast that by the turn of the century taking connection speed into consideration for performance and capacity planning became unnecessary. However, thanks to the mobile era this component is rapidly resurfacing in our domain and is more important now than it ever was. The difference between a one second page load time on a fast desktop web browser, and a six second page load time on a mobile device with even a decent mobile connection is enormous. The impact of this ranges from the customer experience down stream to the back end architecture. Not only does it affect the user, but it may hold open downstream resources for up to 6x longer than planned. This quickly eats away at peak production capacity. In this session we take a look at real data taken from performance testing and analyze the impact of varying carrier speeds on the end-to-end transaction flow. Take away knowledge about how you can modify your testing plans and capacity models to account for the increasing amount of mobile access to your online applications.

In this paper, we present a complete derivation of RAID 0, 1, 5, and 6 reliability and MTTF. We define RAID system reliability as the probability of no data loss which occurs in the case of multiple failures in the same RAID group. We use a Markov model with disk MTTF, disk MTTR, number of disks per RAID group, and RAID protection type as inputs. The model outputs are system reliability and MTTF.
We focus on the RAID protection types in common use. RAID 2, 3, and 4 were used in the early days of RAID but are no longer popular due to performance problems. We focus on RAID 0, 1, 5, and 6 which are still popular today \cite{Gibson92, RAID_Book}.
RAID 0 is the terminology for no redundancy or RAID protection. In this case, a single disk failure results in system data loss. RAID 1 and 5 are variations of single parity, in which the system can tolerate one disk failure per RAID group, but two failures in the same RAID group lead to data loss. RAID 6 is an example of double parity, in which the system can tolerate two disk failures per RAID group, but three failures in the same RAID group lead to data loss.
Our method of derivation is similar to that of Trivedi \cite{Trivedi}. However, we solve for the reliability function directly rather than solve for the reliability density function. The Markov models produce a system of first-order differential equations which we convert to a system of linear equations using the Laplace Transform.
The single parity cases of RAID 1 and 5 require factoring a quadratic polynomial using the quadratic equation. This procedure is relatively straightforward. However, the double parity case of RAID 6 requires factoring a cubic polynomial. This case is much more complicated, because the cubic equation involves complex numbers. The discriminant of the cubic polynomial for real-world cases is very close to zero, the boundary between real and complex roots. For reliability expressions, the roots must be real. A small positive discriminant leads to three real roots, two of which are very close in magnitude.
In computing numerical results for RAID 6 reliability, we encountered numerical instability caused by product of difference terms involving two roots very close in magnitude. We overcame these problems by replacing product of difference terms by their polynomial forms.
We conclude by presenting tradeoffs between capacity and reliability for each RAID protection scheme, RAID 0, 1, 5, and 6.

Presenter bio: Dr. Andrew M. Shooman is currently a Principal Performance Engineer at
EMC Corporation in Hopkinton, Massachusetts, working on performance
and reliability analysis of disk-array storage systems. He received
an SB in Mathematics and Computer Science from the Massachusetts
Institute of Technology, an MS in Computer Science from the Courant
Institute of Mathematical Sciences at New York University, and a PhD
from Polytechnic University in Brooklyn, New York. His doctoral
thesis topic was "Exact Graph-Reduction Algorithms for Network
Reliability Analysis".
Dr. Shooman previously worked for Codex Corporation, a subsidiary of
Motorola, and the IBM T. J. Watson Research Center on network
reliability analysis; Grumman Data Systems on image analysis;
Hazeltine Corporation on artificial intelligence and expert systems;
and Pall Corporation on automated hydraulic filter test data
collection and analysis. His research interests include: graph
theory, probability, combinatorics, algorithms, and computational
complexity.

244 (APM): Dumb and Dumber

Big Data is all the rage right now. Everyone from a social media company to your grandmother's online knitting store is suddenly a big data shop. Application monitoring tools are no exception from this trend - they collect gigabytes of monitoring data from your application every minute. But most of this data is useless. It's dumb data. More data isn't better if the data you're getting from your tools isn't helping you do your job - in fact, it's worse. In this session we'll talk about how to be a little smarter about collecting monitoring data, and how to ensure that the data we're collecting is intelligent, too. I'll talk about a few of the monitoring solutions and approaches I've used during my career as a monitoring architect at a large financial services institution, as well as present a few case studies of customers who have managed to make the leap from bigger data to smarter data.

245 (PET): Agile Aspects of Performance Testing

It looks like agile methodologies are somewhat struggling with performance testing. Theoretically it should be a piece of cake: every iteration you have a working system and know exactly where you stand with the system's performance. Unfortunately, it doesn't always work this way in practice. Performance related activities tend to slip toward the end of the project. Another issue is that agile methods are oriented toward breaking projects into small tasks, which is quite difficult to do with performance: performance-related activities usually span the whole project.
From another side, performance testing is rather agile in itself. Approaching performance testing formally, with rigid, step-by-step approach and narrow specialization often leads to missing performance problems or to prolonged agony of performance troubleshooting. With small extra efforts, making the process more agile, efficiency of performance testing increases significantly and these extra efforts usually pay off multi-fold even before the end of performance testing.
This paper discusses agile aspects of performance testing in detail, including both performance testing in agile projects and doing performance testing in agile way.

Presenter bio: Alex Podelko has specialized in performance since 1997, working as a performance engineer and architect for several companies. Currently he is a Consulting Member of the Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products.
Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His collection of performance-related links and documents (including his recent papers and presentations) can be found at www.alexanderpodelko.com. He blogs at http://alexanderpodelko.com/blog and can be found on Twitter as @apodelko. Alex currently serves as a director for the Computer Measurement Group (CMG, http://cmg.org), an organization of performance and capacity planning professionals.

This paper explores the strategies that VMware ESX employs to manage machine memory, focusing on the ones that are designed to support aggressive consolidation of virtual machine guests on server hardware. To support aggressive server consolidation, the VMware Host grants physical memory to guest machines on demand. By design, VMware allows physical memory to be over-committed, where the overall amount of virtualized physical memory granted to guest machines exceeds the amount of actual physical memory available on the machine. The paper uses a case study to illustrate what happens when the VMware hypervisor confronts a configuration of guest machines that demands access to more physical memory addresses than are available on the underlying hardware configuration. The case study provides an opportunity to observe the effectiveness of the strategies VMware employs to manage virtual memory and the potential impact of those strategies on the performance of the underlying applications running on virtualized hardware whenever there is significant contention for RAM.

The CICS Monitoring Facility (CMF) allows the addition of user fields to the CICS performance records (SMF type 110) to tag your transactions with business indicators. In that way, you can log and monitor your business using the same infrastructure. This paper goes through the required steps to implement it on your CICS environment.
Author: Roberto Pacheco and Luiz Gazola.

Presenter bio: -Solid knowledge in Information Technology with more than 16 years working with IBM Mainframe zSeries and z/OS operating system.
-Experience with Technical Support in z/OS environments and products running in this platform. Knowledge in internal z/OS structure, troubleshooting, problem determination and dump analysis using IPCS, installation and configuration of new software products and new versions of z/OS Operating System.
-Experience developing software products for z/OS environments, good skills in internal z/OS structure, control blocks, High Level Assembler programming, C, C++, Low-level programming, multithreaded, client/server, services, communications, transactions, tuning and high-performance, subsystem and SVC.

Hadoop is a leading open source tool that supports the realization of the Big Data revolution and is based on Google's pioneering MapReduce work in the field of ultra large amount of data storage and processing. Instead of relying on expensive proprietary hardware, Hadoop clusters typically consist of hundreds or thousands of multi-core commodity machines. Instead of moving data to the processing nodes, Hadoop moves the code to the machines where the data reside, which is inherently more scalable. Hadoop can store a diversity of data types such as video files, structured or unstructured data, audio files, log files, and signal communication records. The capability to process a large amount of diverse data in a distributed and parallel fashion with built-in fault tolerance and using free software and cheap commodity hardware makes a very compelling business case for the use of Hadoop as the Big Data platform of choice for most commercial and government organizations. However, making a MapReduce job that reads in and processes terabytes of data spanning tens of hundreds of machines complete in an acceptable amount of time can be challenging as illustrated here. This paper first presents the Hadoop ecosystem in some detail and provides details of the MapReduce engine that is at Hadoop's core. The paper then discusses the various MapReduce schedulers and their impact on performance.

255 (PET): A Viable Way to Overcome Performance Problems Right After Systems Roll-outs

Donghyun Park, LG CNS

Room: Aventine F

A Viable Way to Overcome Performance Problems Right after System Rollouts

In most SI (System Integration) Projects, performance tests are usually done before system rollouts. But right after the rollouts, significant performance degradations and service failures would often occur which led to customer complaints. Even though we make realistic workload models and test scenarios considering peak load, why do performance problems still occur? This paper will discuss some key reasons for those performance problems and propose a viable way to overcome these problems with some real cases.

Tuesday, November 5, 19:00 - 20:00

CONF: Reception

Wednesday, November 6

Wednesday, November 6, 07:00 - 08:00

CONF: Breakfast

Wednesday, November 6, 08:00 - 09:15

AVENTINE A/B 301 (Featured Speaker): All Data is not Created Equal: The Problem of Long-Term Media Storage

PLENARY SESSION

David MacKinnon, NBCUniversal

Wednesday, November 6, 09:15 - 09:35

CONF: Break

Wednesday, November 6, 09:35 - 10:35

311 (Featured Speaker): From Hollerith to Hadoop - History of Storage During My Life in IT

Terry Orletsky, Ken Blanchard Companies

Room: Aventine A/B

An entertaining and fact-filled look at how we got from here to there. A byte is still a byte, but there are so many more of them, and they cost so much less than they once did. We are privileged to look at the Digital Age through the eyes of a pioneer and hands-on participant since the 1960's. He presents us with a unique horizontal perspective on technology of information. The agenda is simple: what went before, what is happening now, and where we might be going. And don't forget the human perspective.

312 (CMG-T): CMG-T: Network Performance Engineering - Part 1

CMG-T

Manoj Nambiar

Room: Aventine C

Although one may not be conscious of it, networks are an integral part of most enterprise systems and applications. It naturally follows that network performance is crucial to overall system performance. Knowing how networks affect applications helps in optimizing application performance and avoiding application blackouts or brownouts.
Participants can expect to learn the following from this session
•Networks, TCP/IP, their characteristics and how they impact performance
•How can applications be designed and tuned for best network performance?
•Tools for network performance analysis
•Diagnosing application performance using network sniffers
•Network devices available today and their effect on performance
•Network Monitoring
•Network Sizing
A basic understanding of networks and its layered architecture is expected from participants.

313 (APM): I Deployed a Java Application (and Lived to Tell About It)

Over the past 15 years the author has worked with numerous Java applications and has used a number of techniques to pinpoint issues with those applications. This paper describes a number of those techniques that the author has found to be most helpful. The paper includes such topics as solving memory issues and pinpointing problem areas.

314 (CP): The 10 "Elusive Obvious" Facts of Capacity Planning

There are many facts about the capacity planning experience that are widely known but rarely stated out in the open. Those facts are often critical to maximizing the effectiveness of the capacity planning endeavor, but are ignored or minimized in IT life. This paper aims to more concretely state these facts and present solutions to rationalize capacity planning in the enterprise.

Over the past few years, Order Management systems have been subjected to dramatic changes from technology platform, enterprise architecture and with those changes emerged new performance challenges. Fast growth of e-commerce has further challenged the scalability and performance of back-end Order management systems and boosted the importance of stability of systems involved. This paper presents processes and tools used to test and tune such a complex system by resolving underlying complexity by decoupling services, batch jobs and reports. In this paper we focus on different workloads, monitoring, tuning and recovery aspects.

Presenter bio: Lakshmi Srinivasa has 15 years of technology experience ranging from Technical Support to Performance Engineering with a focus on Performance Testing/Engineering for the past 10 years. As an IT Manager, Lakshmi leads the Application Performance Engineering service team that supports Performance Test and Engineering activities for all the projects/applications across all the verticals in Staples, US. He laid down the Performance testing methodology for Staples, implemented APM tool for enhancing PERF Engineering capabilities. His passion and vision is to provide a Center of Excellence for all PERF Test/Engineering and Capacity Planning/Management activities across the globe for Staples

Wednesday, November 6, 10:35 - 10:45

CONF: Break

Wednesday, November 6, 10:45 - 11:45

321 (Featured Speaker): Capacity Planning for NoSQL

Asya Kamsky, 10gen

Room: Aventine A/B

In this session we will examine how to capacity plan for MongoDB deployment. After a brief introduction to MongoDB, we will look at how MongoDB uses physical resources and discuss what is involved in planning capacity growth, given MongoDB requirements.

322 (CMG-T): CMG-T: Network Performance Engineering - Part 2

CMG-T

Manoj Nambiar

Room: Aventine C

Although one may not be conscious of it, networks are an integral part of most enterprise systems and applications. It naturally follows that network performance is crucial to overall system performance. Knowing how networks affect applications helps in optimizing application performance and avoiding application blackouts or brownouts.
Participants can expect to learn the following from this session
•Networks, TCP/IP, their characteristics and how they impact performance
•How can applications be designed and tuned for best network performance?
•Tools for network performance analysis
•Diagnosing application performance using network sniffers
•Network devices available today and their effect on performance
•Network Monitoring
•Network Sizing
A basic understanding of networks and its layered architecture is expected from participants.

323 (PET): PANEL: Performance Requirements: What? When? How?

PANEL

Alex Podelko, Connie Smith, Dan Bartow, Lloyd Williams

Room: Aventine D

A discussion about performance requirements: what, when, and how to gather. Place of performance requirements in software development life cycle.

The performance world has long been concerned with the speed and scalability of their IT infrastructure in relation to user experience. Conventional load testing tools provide great benefits in testing and tuning your applications for optimal performance. However, with the emergence of web 2.0 and more interactive web applications, these traditional performance testing tools were not designed to detail the application's performance from the end-user perspective. Client side performance is often a field which falls somewhere between the development, functional and performance organizations and is often neglected. This paper will present some best practices for ensuring better perceived application performance from the end users perspective, and an approach to using client side measurement tools in conjunction with the Selenium automation software to monitor and track an application's performance from the end-user's perspective.

Wednesday, November 6, 12:00 - 13:00

CONF: Lunch

Located in Pavilion (glass building across patio)

Wednesday, November 6, 13:15 - 14:15

331 (Featured Speaker): Get Real: Automated Modeling of Performance for Real Time and Embedded Systems

Connie Smith (Michelson Award Winner - 1986)

Room: Aventine A/B

Performance, both responsiveness and scalability, is an important quality of software. Yet, performance antipatterns often occur in a software architecture/design that are not discovered until testing, when they are difficult and costly to fix. We describe a paradigm that provides developers with quantitative feedback on the performance of architecture and design alternatives that enables them to select alternatives that meet performance requirements. The approach augments model-based engineering with Software Performance Engineering (SPE) models for predicting performance during the early stages of development, before the architecture is fully determined. An overview shows how design models, performance models, and a variety of transformations and tools interact to provide useful results to developers. It presents key components of the framework including analysis specifications as well as the performance model interchange formats: S-PMIF and PMIF. We describe proof-of-concept results.

332 (CMG-T): CMG-T: Java Performance Analysis/Tuning - Part 1

CMG-T

Peter Johnson, Unisys Corporation (Mullen Award Winner - 2006)

Room: Aventine C

333 (CP): Better Prediction Using the Super-Serial Scalability Law Explained by the Least Square Error Principle and the Machine Repairman Model

Jayanta Choudhury, TeamQuest Corporation

Room: Aventine D

Better Prediction Using The Super-serial Scalability Law Explained by The Least Square Error Principle And The Machine Repairman Model

The implications of no-coupling as a way to model ``loose coupling'' between coherency latency and contention latency in the Universal Scalability Law is analyzed using the generalized machine repairman model of Gunther. This analysis provides a theoretical explanation of better prediction by the Super-Serial Scalability Law compared to the Universal Scalability Law. The deviation of model predictions from measurements at the tail of both the Universal Scalability Law and the Super-Serial Scalability Law is explained using the linear load-dependent service time of Gunther's generalized ``machine repairman model''. Potential limitations and new possibilities of these findings are discussed. Two sample data sets, collected by Schwartz in 2010, are used to experiment and compare the prediction reliability of the Super-Serial Scalability Law and the Universal Scalability Law. The results support the theoretical claim that the Super-Serial Scalability Law provides a better model than the Universal Scalability Law.

Presenter bio: Dr. Jayanta Choudhury is a technology researcher at TeamQuest Corporation, focusing on capacity planning and performance modeling for IT resource optimization. He has been presenting at CMG International conference every year since 2012. He received a Ph.D in applied mathematics and an MS in computer engineering from the University of Louisiana at Lafayette, in 2008 and 2002 respectively. His research interests include performance modeling, capacity planning, operations research, high performance computing, algorithm development, data analysis, numerical analysis, and numerical solution of PDEs, ODEs and their applications.

334 (PET): The Measurement Based Model to Study the Affect of Increase in Data Size on Database Query Response Time

Rekha Singhal, TCS

Room: Aventine E

The measurement based model to study the affect of increase in data size on database query response time

In a typical database application environment, database queries have a major share in contributing to application's response time. A database query elapsed response time (ERT) primarily consists of time spent on the disk subsystem (i.e. IO access time) and CPU processing, which changes with change in size of the database. The IO access time is a major contributor to a query's elapsed response time. Therefore, in this paper, we present a measurement based technique to model the IO access behavior of a database query which may be used to predict the IO access time of the query with change in size of the database. We have presented a query taxonomy based on a query's different mode of table access which may impact its IO access time. We have verified the model by conducting measurements on real system for synthetic queries based on TPC-H benchmarks and have presented the results.

Presenter bio: Dr. Rekha Singhal has 20 years of research and teaching experience. She has worked with CDAC and TRDDC research centers. Recently, one of CDAC products, Revival 2000, developed under her guidance has received NASSCOM Technology award. She has lots of publications in both national and international conferences and journals. She has filed patents in India. She has taught BE, ME, MCA and MBA students in prestigious Institutes such as TISS, NITIE etc. Her research interests are Query Performance Prediction, Database System optimization, Database Distributed systems, Storage Area Networks, TCP/IP networks and Health IT. She is Ph.D and M.tech from IIT Delhi. Currently she is working as Senior Scientist with TCS Innovation Labs, Mumbai.

System monitoring platforms offer advanced functionality to help IT organizations manage their resources and understand system behavior. These platforms collect, act on, and visualize data about running systems. The performance of a monitoring system under a given workload must be understood in advance in order to achieve identified performance objectives. Myriad system monitoring platforms exist for implementation by organizations, each offering distinguishing features. Most exhibit a similar system architecture consisting of a central monitoring server, a monitoring datastore, and hosts and services to monitor. Scheduling, obtaining and persisting the status of monitored hosts and services forms the primary workload placed on the monitoring system. This paper presents a closed queuing network (QN) model that predicts response time and utilization for this primary workload placed on a monitoring system. The model accounts for monitoring frequency, CPU and disk consumption at the monitoring server, and response time of the monitored system. Experiments conducted with the open source Nagios monitoring platform validate the model predictions across several workloads.

Presenter bio: I am a senior information technologist with experience progressing from software development to solution architecture. I lead technical teams of various sizes. My interests and strengths lie at the nexus of web services, application integration, business process management and workflow. I have a B.S. with Distinction in Operations Research (OR) from Cornell University, and am currently working towards an M.S. in OR from George Mason University.

Wednesday, November 6, 14:15 - 14:25

CONF: Break

Wednesday, November 6, 14:25 - 15:25

341 (Featured Speaker): The Seven Deadly Sins of Technical Presentations and How to Resist Them

Denise Kalm, Kalm Kreative, Inc.

Room: Aventine A/B

The Seven Deadly Sins of Technical Presentations and How to Resist Them

What's the difference between a great technical presentation and a poor one? It's easy to know that you didn't get what you wanted out of a session, but it may be harder to understand why. As you prepare your own presentations, knowing the key elements to a great speech can not only help you deliver the message you intended, but also set you apart as a key contributor. Learn how to avoid the mistakes others make and bring your delivery to a new level. Get the message out there, successfully.

Presenter bio: Denise P. Kalm is the Chief Innovator at Kalm Kreative, Inc., a marketing services organization. Her experience as a performance analyst/capacity planner, software consultant, and then marketing maven at various software companies grounds her work providing contract writing, editing, marketing and speaking services. She is a frequently published author in both the IT world and outside and has 3 books: Lifestorm, Career Savvy-Keeping & Transforming Your Job, Tech Grief - Survive & Thrive Thru Career Losses (with L. Donovan). Kalm is a requested speaker at such venues as SHARE, CMG and ITFMA and has enhanced her skills through Toastmasters where she has earned her ACG/ALB . She is also a personal coach at DPK Coaching.

342 (CMG-T): CMG-T: Java Performance Analysis/Tuning - Part 2

CMG-T

Peter Johnson, Unisys Corporation (Mullen Award Winner - 2006)

Room: Aventine C

343 (PET): PANEL: Is Load Testing in Crisis?

PANEL

Alex Podelko, Dan Bartow, Robert Buffone, Lakshmi Srinivasa

Room: Aventine D

Is load testing in crisis? What are today's issues and how serious they are? Why many Internet startups don't do load tetsing? What is the future of load testing?

Data analysis for performance metrics from a processor to characterize workload involves collecting significant quantity of hardware performance metrics data for a given software workload. These collected metrics are characteristic of a typical big data environment with large feature set. Though analysis has been performed on these metrics by using standard statistical procedures, distributed big data analytic models are seldom applied in this context. In this experiment we build an innovative model that uses advanced statistical and machine learning algorithms for quantitative analysis. The model learns relationships between the attributes and captures the natural structure of the data. It uses unsupervised clustering algorithm like K- means to divide the data into groups that represent phases in a workload. The paper addresses the challenges of dealing with high dimensional performance data. The analytical model is a systematic approach to monitor and analyze large amount of performance metrics to learn and classify workloads to detect phases.

345 (ITSM): Alternative Metrics for Servers RFPs

Late Breaking

Joseph Temple, IBM

Room: Aventine F

It is becoming increasingly clear that common metrics based on standard benchmarks are a poor way to specify qualification for RFPs. Fundamentally, the benchmarks measure Internal Throughput Rate (ITR) or the maximum rate at which a machine can do work. Businesses do not measure ITR but rather measure how much business is done over periods for which the ITR is rare sustained (Daily, Weekly, Monthly, Quarterly, Annually). Thus ETR (ITR x Utilization) has a better connection to the business metrics. Also, users rarely experience either throughput rate but rather experience response time which in most cases has a highly non-linear relationship to throughput. In this paper we will propose a model which will allow specification of 2 or three parameters for qualification with realtive response time as a an evaluation criteria. The inclusion of a second parameter, and defining throughput based on machine parameters rather than assuming a specific benchmark result is representative of the desired characteristics should result in better sizings and better RFP creation.

Wednesday, November 6, 15:25 - 15:45

CONF: Break

Wednesday, November 6, 15:45 - 16:45

351 (PET): Large Scale Performance Testing from the Cloud

Late Breaking

Dan Bartow, SOASTA

Room: Aventine A/B

What do nine out of the top ten online retailers have in common? What were only 10% of organizations planning to do four years ago that has shifted to nearly 90% of those surveyed today? The answer is production performance testing from the cloud. More specifically, it's the practice of provisioning cloud based load generators outside of the firewall of a production applications and using them to generate large scale traffic. This process ensures optimal performance and capacity for peak. It takes the guess work out of production capacity modeling and adds a high confidence verification phase that ensures all systems are go for the biggest traffic windows of the year. This practice has evolved into a critical component of best-in-class online applications. In this session, attendees will learn about this technique from the engineer formally credited with pioneering and heavily evangelizing it the industry. Learn how to mitigate the foremost concerns with testing in production; security, test data in production, and potential live customer impact.

352 (CMG-T): CMG-T: Java Performance Analysis/Tuning - Part 3

CMG-T

Peter Johnson, Unisys Corporation (Mullen Award Winner - 2006)

Room: Aventine C

353 (PET): Panel: Performnce Engineering Body of Knowledge

PANEL

Alex Podelko, Peter van Eijk, Walter Kuketz, Daniel Menasce

Room: Aventine D

What every performance engineer should know? Do we need Performance Engineering Body of Knowledge?

Due to continuous growth of users and traffic, CDN(Content Delivery Network) service could be a good idea to increase the performance of users and maximize the efficiency of network Infra investment. By the way, in case of changing CDN service provider, unexpected network bottleneck could occur because huge network traffics, handling by existing CDN, will inflow into origin server temporally while caching at new CDN. In such circumstances, we could respond effectively by using conventional Internet routers or LAN Switches which support QoS(Quality of Service). If you are looking for the effective method at a small cost and simple implement, NIC(Network Interface Card) of server would be a good alternative solution to control the inflow of network traffic into server. Besides, it can be said that this is a practical way that could be applied to a wide range of expected traffic congestion which could cause network bottleneck. In this paper, I would like to discuss a point to be considered when we apply above method.

Computer Measurement Group has been around for a long time and may have lost some of its original vision. The need for changes has become manifest and the whole management team is now working together as a unit to make them effective. It now needs new visionaries who can define primary goals and then focus on key deliverables, to make the most of the limited volunteer effort it has, living within its means and yet providing more significant member benefits to resuscitate the membership.

Wednesday, November 6, 16:45 - 17:00

CONF: Break

Wednesday, November 6, 17:00 - 18:00

362 (CONF): Can Capacity and Finance Co-Exist?

Late Breaking

John Baker

Room: Aventine C

There is a never-ending conflict in datacenters all over the world: Capacity Planning says what you need and Finance says what you get. The key is to find that middle ground. If capacity is too low, service will be poor and business will suffer. If capacity is too high, the resulting costs cause the business to suffer as well.
Hardware costs are fairly straightforward, it is the monthly software costs (MLC) that can be significant. IBM provides options to pay only for what you use via sub-capacity pricing. Many shops have avoided this option however as this requires a "cap" to be set and enforced, potentially constraining important workloads.
What is needed is a methodology to track the ongoing utilization of the systems and gradually reduce the resource consumption of selected (low priority) workloads so that you can get close to the cap but not hit it and suffer delays in service. Could Capacity and Finance co-exist? Could there actually be peace and harmony? Come listen and judge for yourself.

363 (CP): What the Vendor Says and What the Vendor Really Means: 10 things I've Learnt about Buying Software since I Stopped Being a User and Started Working for a Vendor

Phillip Bell, Metron Technology

Room: Aventine D

What the vendor says and what the vendor really means: 10 things I've learnt about buying software since I stopped being a user and started working for a vendor

The generally accepted wisdom is that you should define your capacity management process, and then look at the software you will need to support it. There are lots of papers on how to define your process, but not much help on how to choose your software tool(s) afterwards. There are many tools out there to look at, and each vendor will tell you why theirs is the best in the market. But how do you pick the best one for you? How do you interpret what the vendor is really saying? And how do you justify the initial cost to the business? Having been both a user and vendor, I intend to use what I have learnt along the way, to help you answer those questions and more.

Presenter bio: I first started working in the capacity management field in 2000. Initially I was involved with a product for Unisys Mainframes, and through various roles as both a user and vendor of capacity planning and management software, I have spread my experience over everything from AS400 to VMware.

364 (PET): Performance Testing of NoSQL Applications

This paper discusses the approach and solutions to performance test NoSQL databases. It also includes the available tools and how they can be used for this purpose taking Cassandra as an example. The paper also highlights the areas that should be looked at to get optimum performance from the Cassandra clusters in the application.

Presenter bio: Mustufa is an experienced, delivery-focused technology specialist with close to 10 years development experience. He is currently working as a Performance Architect at Impetus Technologies. He brings diverse technology solution experience in design, development, testing and deployment. His expertise includes performance engineering for software products, bottleneck identification and diagnostics, profiling and tuning app and database servers. The recent focus has been on cloud computing, big data and mobile performance.

This paper presents the author's experiences in performance troubleshooting and optimization of a large logistics ERP system. The system needed to process half a million shipments leading to twelve million operations per day. The optimizations not only helped smooth functioning of production usage but also enabled a two fold increase in number of users. No system performance testing was done before going in production. Hence, the major challenge in this work was the fact that performance optimization had to be done directly on the production system, as there was no test environment. We present the design improvements that were done to maximize the performance improvement with minimal impact on the application functionality.

Presenter bio: Mr. Santosh Kangane works with Persistent Systems Ltd, Pune, India and Currently hold the position of Module Lead (Database performance DBA). Santosh's focus areas are application design, Development and Performance engineering. He has worked on Oracle 10g/11g, Oracle 11g RAC, Microsoft SQL Server, PostgresSQL, MySQL, Greenplum MPP databases for more than 4 years.

Wednesday, November 6, 18:00 - 19:00

CONF: BOFS

See BOFS Sign Up Sheets at Session Control for details.

Rooms: Aventine A/B, Aventine C, Aventine E, Aventine F

CONF: Regional Officers/Vice-Presidents Meeting

BOF

Dave Thorn, Denise Kalm

Room: Aventine D

Wednesday, November 6, 19:00 - 20:00

CONF: Reception

Thursday, November 7

Thursday, November 7, 07:00 - 08:00

CONF: Breakfast

Thursday, November 7, 08:00 - 09:15

All capacity planners with decades long careers have seen languages come and go and great ideas like object oriented code, change control, ITIL and many others rise in prominence in the literature (well, now-a-days blogs) and then fall into the dusty sidelines, not because they were bad ideas, but rather because they exceed the details bandwidth of the average (and below average) coder. The speed of code production being outsourced to faraway places long ago exceeded the speed that the great programming ideas and best practices reached these same places.
The result is that the vast clouds of hardware now available, and the huge bandwidth available in many firms is all too often choking on the coding and logic decisions made by coders selected by "cost per hour" rather than skills and experience.

Attempting to improve all the skills of all coders on the planet is a nonsensical goal. We need instead to recognize the common patterns of coding failure, and become much better at automatically detecting and correcting issues early in the code development phase, not after the monthly reports fail to run in the allotted time for the third month in a row.

Thursday, November 7, 09:15 - 09:35

CONF: Break

Thursday, November 7, 09:35 - 10:35

Capacity planning has been a well-established practice for over 30 years. During that time, the tools, techniques and processes have been defined and refined. However, our traditional approach cannot keep pace with today's rapidly changing environments; we need to revolutionize the practice of capacity planning.

This paper will examine the current scope and focus of capacity planning and propose an innovative methodology to evaluate, predict and plan for the all-inclusive Digital Infrastructure. It is no longer sufficient to utilize yesterday's outmoded approach when planning for tomorrow's applications, systems and facilities infrastructures. We need to revolutionize the practice of capacity planning. This paper identifies the goals and challenges of Digital Infrastructure capacity planning and defines a new approach that adapts to tomorrow's extraordinarily dynamic, diverse and expanding environments.

412 (CP): Business and Application Based Capacity Management

Late Breaking

Ann Dowling, IBM; Clea Zolotow, IBM

Room: Aventine C

413 (APM): On the Applicability of Subsystem Auto-Tiering in the High Performance Database Environments and Other I/O

Anthony Mungal, EMC Corporation

Room: Aventine D

On the Applicability of Subsystem Auto-Tiering In the High Performance Database Environments and Other I/O

This paper reviews many of the more advanced concepts and constructs of DASD I/O subsystem auto-tiering and its impact on configuring High Performance Databases and Other I/O. The new DASD subsystem environment is one mainly characterized by the presence of virtual pools enabled through storage virtualization, multiple drive technologies which enable tiered storage pool structures, wide striping across multiple disk spindles for raw device level performance, massive central caches, and performance policies akin to a workload manager concept providing for almost autonomic management of both active and inactive data. In fact, auto-tiering is having a profoundly positive impact on performance analysis and capacity planning in many ways. For starters, it enhances the analyst or storage administrator's ability to configure storage, respond to ad hoc storage requests using virtual provisioning, "actively" manage performance, optimize capacity using oversubscription, reduce cost through the use of high capacity SATA drives, and more. Although, quantitative observations and analysis will be drawn from a very high I/O DB2 environment running in AIX, much of the observed subsystem behavior, and lessons learned can be applied to databases and other I/O operating in any of the other major operating environments such as z/OS, Linux, Windows or ESX.

414 (CP): Key Metrics for Effective Storage Performance and Capacity

Doing capacity management for storage can be difficult with the many complex and varied technologies being used. Given all of the options available for data storage strategy, a clear understanding of the architecture is important in identifying performance and capacity concerns. A technician looking at metrics on a server is often seeing only the tip of a storage iceberg. Knowing which metrics are important will depend on your objectives and storage architecture, but response and space utilization will always be key to effectively managing storage.

Presenter bio: Charles Johnson has been in the Information Technology industry for over 30 years. This has included working at a number of Fortune 500 organizations in different business sectors, including insurance, financial and automotive. Charles has been involved in Performance and Capacity for zOS for the majority of his career, both as a technician and manager. Charles is currently a Principal Consultant with Metron-Athene, Inc., a worldwide software organization specializing in Performance and Capacity Management.

In this paper, we present benefits of performance testing, forms of performance testing, key success factors and provider a framework to build a business case for Performance Testing and Application Performance Monitoring. It will be beneficial for beginner Performance Engineers and help close gaps for existing engineers by illustrating some best practices and guidelines for Successful Performance testing, and building a Performance Testing Center of Excellence.

Presenter bio: Performance Engineering Architect

Thursday, November 7, 10:35 - 10:45

CONF: Break

Thursday, November 7, 10:45 - 11:45

In this talk, we will present an empirical study of over 10,000 web sites, analyzing the correlation between: site content/complexity, search rankings, site performance, and user engagement. Using a global network of real browsers, we captured site metrics for performance, user experience, and content complexity (looking at different styles of sites with a wide range of content types like javascript files, images, media, and social widgets). We then appended statistics on the site's search engine rank, social rank, traffic volume, and user engagement (visits/month, pages/visit). The results show statistically significant correlations - some positive, some negative, some quite surprising - among the factors studied. In this talk we will show you the data we collected, and talk about what we found: what works on web sites and what doesn't, how site content drives engagement metrics, and performance techniques for optimizing site traffic, engagement, and conversions.

Establishing relative server capacity has been an age old problem brought about by differing platform design points and continuously evolving platform generations. Many theoretical models have been attempted to solve this problem. A model that has been previously proposed and adopted looks at two key metrics - thread speed and throughput - as its building blocks. That model has been quite effective in conceptual and theoretical discussions, but lacked a connection with benchmark data. This late breaking paper attempts to bridge the theoretical model with the world of real world test case measurements. Instead of using theoretical thread speed and throughput numbers based on platform specs, the paper looks at the impact of adjusting those metrics based on benchmark observations. We look at examples of how the relative capacity model can be refined to give a better indication of whether a particular workload "fits" a particular platform well.

423 (ITSM): PANEL: Starting Your Own Company in the Performance & Capacity Field

The session will feature panelists who are Michelson Award winners and distinguished CMG officers. Besides being known for their technical contributions to the field, each one of them has been a successful entrepreneur. The panelists will share their experiences in starting their own company, and will offer their advice to junior CMG attendees who wish to start their own company. This will be a highly interactive session with questions from members of the audience who are just becoming entrepreneurs.

About 15 years ago, clusters of commodity microprocessors largely overtook custom-designed systems as the high performance computing (HPC) platform of choice. The design and optimization of workload scheduling systems for these clusters has been an active research area. This paper surveys some representative examples of workload scheduling methods used in contemporary large-scale applications such as Google, Yahoo, Facebook, and Amazon that employ a MapReduce parallel processing framework. It examines a specific MapReduce framework, Hadoop, in some detail. The paper describes a novel dynamic prioritization, self-tuning workload scheduling approach, and provides simulation results that suggest this approach will improve performance compared to the standard Hadoop scheduling algorithm.

425 (PET): Data Distributions, Thresholds, and Heavy Tails

The intention of this paper is to provide a better understanding of how non-normal performance data behaves in general-purpose computing environments and how that knowledge can improve our performance analysis. There is a large body of evidence showing that non-normal data distributions (known as having long, fat, or heavy tails) are common in computer systems, including but not limited to: job service times, process execution times, sizes of files both stored on Web servers and transferred through the internet and I/O traces of disk and tape activity. If general computer metrics data is not normally distributed, what then? Does it matter? How extensive are these non-normal distributions? What impact do they have? Can we learn anything from them? To help answer these questions, a large number of Perfmon counters were examined in terms of data distributions. Comparisons of statistics such as skewness, length of tail, and standard deviation (often abbreviated with the Greek letter sigma σ) were made between actual distributions and assumed normal distributions. The analysis of data distributions provided an increased understanding of how performance data behaves. This understanding led to ways to improve the process of threshold determination that is independent of the underlying data distribution. Because the process is simple it is possible to automate setting thresholds for large numbers of Perfmon counters. This in turn allows concurrent reporting on a wide range of Perfmon counters whose thresholds are exceeded at the same time. The relationships between different counters become immediately visible.

Thursday, November 7, 12:00 - 13:00

CONF: Lunch

Located in Pavilion (glass building across patio)

Thursday, November 7, 13:15 - 14:15

We apply a design of experiments methodology to study the impact of popular front end optimization (FEO) techniques on web performance. For a broad sample of web pages, we characterize the performance impact of techniques including: CSS, javascript and image combination; inlining images and javascripts; client-side cache; image compression; code minification; asynchronous loading of scripts; asset prefetching; and others. In addition to assessing the impact of each factor, we identify key interactions among factors, and propose heuristics for choosing optimization strategies as a function of content and complexity.

433 (APM): Best Practices Every Database Performance Engineer Should Know

Performance is frequently one of the last things in people's mind when developing a system. Unfortunately, that means it becomes the biggest problem after that system goes to the production. Database performance contributes directly to most of the overall application response time. The CPU, the disk, the memory, the number of users, the amount of data, the type of indexing and writing the best T-SQL are discussed in much detail with practical examples. This paper also focuses latest methodologies in Performance Testing SQL Server Database, Monitoring and Identifying and resolving Performance Issues
Setting up Performance Monitors to collect detailed information about the utilization of operating system resources such as system performance including memory, disk, processor and the network. In addition, several other metrics are discussed to address variety of functional areas within the database such as Dynamic management views (DMVs) and Dynamic management objects (DMOs) that goes into the SQL Server memory and reads out the current allocations.
Indexing is one of the best ways to reduce disk I/O. Benefits and overhead of an index with general recommendations for index design, Clustered and nonclustered index behavior and comparisons, recommendations for clustered and non clustered indexes and advanced indexing techniques such as covering indexes, index intersections, index joins, filtered index, distributed indexes and columnstore indexes are discussed. Database Engine Tuning, Lookup, Statistics and Fragmentation analysis recommendations are discussed.

For any large IT system with complex architecture and robust business growth, predicting and provisioning for future capacity is a challenge to business. Over provisioning the infrastructure capacity will reduce the cost effectiveness of IT of an organization. While under provisioning will have direct impact to business leading to financial and brand value losses.
To address this problem for an existing enterprise application, one approach is to develop a performance profile, which can serve as the foundation for further performance tuning and capacity planning studies, from historical infrastructure and application usage data.
This paper aims to illustrate one such solution wherein a predictive capacity model was build coupling the application, system usage data and business usage estimates to address the capacity provisioning challenges of a retail division of a large telecom services provider.
During this exercise historical workload and application utilization data were mapped in a configurable time interval and analyzed using basic performance engineering principles (utilization law) and statistical analysis techniques (correlation and linear regression) to come up with rules that relate utilization with workload and background load. These rules later fitted in an excel based UI model which estimated the capacity based on identified sets of business and technical workload inputs.
This paper highlights the strategy, approach, considerations in executing the project covering the technical challenges in building the application performance profile, mapping it with the business usage data to derive the scaling rules, embed them in the UI based capacity model, accuracy achieved and benefits to the customer. This also tries to comment on future roadmap for such a model in terms of scope of refinement and maintenance.

Performance Testing of web-based applications have been addressed with the help of standard load testing tools and their methodology is well established and adopted by the Software Industry. With the advent of distributed Service Oriented Architecture (SOA) applications, the load testing of services poses its own challenge to Performance Testers and Engineers. In particular, this paper presents an approach and tool by which the challenges for performance testing a messaging service (services using SOAP over MQ) can be overcome. Further, the paper illustrates how existing tools can be adapted or new tools can be used to test them as well. It also specifies the testing, monitoring and tuning aspects of it as well.

Presenter bio: Lakshmi Srinivasa has 15 years of technology experience ranging from Technical Support to Performance Engineering with a focus on Performance Testing/Engineering for the past 10 years. As an IT Manager, Lakshmi leads the Application Performance Engineering service team that supports Performance Test and Engineering activities for all the projects/applications across all the verticals in Staples, US. He laid down the Performance testing methodology for Staples, implemented APM tool for enhancing PERF Engineering capabilities. His passion and vision is to provide a Center of Excellence for all PERF Test/Engineering and Capacity Planning/Management activities across the globe for Staples

Thursday, November 7, 14:15 - 14:25

CONF: Break

Thursday, November 7, 14:25 - 15:25

441 (PET): Margins for Error in Performance Anlysis

Performance analysis tests are very sensitive to the accuracy of details, test conditions, and parameters. No test is perfect, but margins for error in performance testing introduce RISK into the testing, cause DELAYS in solving problems, and in fact often lead to false solutions to non-existent problems. This paper presents two actual customer case studies which illustrate common ways in which the results of a software performance analysis effort can derailed. One is an interactive html application, the other is a batch job. The main topics emphasized are faulty problem definitions and thinking outside the box, with numerous other concepts addressed.

The changes to the IT Infrastructures our organizations and the service delivery paradigms on which they are based are very profound and coming at an accelerating rate. This raises two broad questions this panel will address. 1) What do these changes portend for the Capacity Management profession and how it delivers high value to organizations? 2) What additional skills will be required for practitioners to be high-value employees in their organization?

444 (APM): A Stepwise Approach to Software-Hardware Performance Co-Optimization Using Design of Experiments

Keerthi Palanivel, Univ. of Minnesota

Room: Aventine E

A stepwise Approach to Software-Hardware Performance Co-Optimization using Design of Experiments

Configurations of hardware and software parameters play a major role in determining the performance of an application program. With increasingly complex systems, the task of finding the optimal configuration parameters becomes very difficult. When a performance analyst needs to find the best hardware and software settings to run an application, the traditional approach of trying different combinations, such as trial and error and tuning one factor at-a-time, might not lead to optimal results. We present a methodology for applying Design of Experiments (DOE) techniques to vary the software and hardware factors in a systematic manner. We show how to use a Plackett and Burman (PB) analysis to find the main factors followed by a full-factorial design using Analysis of Variance (ANOVA) and the F-test to statistically quantify the effects of the main parameters and interactions. This systematic approach reduces the time and cost needed to select the software and hardware configurations to optimize a software application on a new or existing platform. We demonstrate with two case studies how this approach can be applied to software-hardware co-validation. DOE is also applied to select optimal parameters for a key enterprise server workload on early platforms. The authors hope to share their excitement about the opportunities DOE provides to reduce the time taken for performing hardware and software tuning to improve application performance and human productivity.

445 (CP): Performance and Capacity: New features with vSphere 5.1 and 5.5

Late Breaking

Ellen Friedman

Room: Aventine F

vSphere 5.1 was announced last year and vSphere 5.5 was just announced at vmWorld 2013. Both offer significant improvements in reliability, availability, scalability and performance making it much more viable for mission critical Tier-1 applications. Scalability improvements with vSphere 5.5 now can support VMs with up to 64 vCPUs and 1 TB of memory. Network and storage IO improvements as well as latency improvements provide a compelling argument for server virtualization. This presentation will review the new features for performance and present considerations and metrics for review for optimized performance.

Thursday, November 7, 15:25 - 15:45

CONF: Break

Thursday, November 7, 15:45 - 16:45

451 (ITSM): It's Not Always What You Say...

Late Breaking

Ron Kaminski (Mullen Award Winner - 2003)

Room: Aventine A/B

All capacity planners face challenges in getting our firms to understand and act on what we discover and our communications styles have to adapt to different corporate cultures to be successful. Join Ron as he shares what tricks and techniques he has learned in 28+ years that will help you get your discoveries noticed and techniques to greatly shorten the time between "issues discovered" and "fixes deployed". The session will be loaded with real examples from large scale distributed environments. Ron will cover not only the content of your messages but also other techniques that change your message from "just another nag in the pile" to "We need to act on this now!".
Ron will end with some consulting where audience members share "less than successful" communications examples and we use what we learned to devise better ways for the future.

Performance of a composite business service application depends upon the design of business process model (BPEL) built with inhouse business services and external web services which in turn influenced by the underlying infrastructure, operating system and middleware components and its tuning parameters at both internal and external deployment environments. Usually reactive approach of application testing is used to determine the performance of application which is costly,time consuming and only at the final stage of software development life cycle. Consequently a proactive approach of performance modeling with analytical techniques like QN, EQN and LQN tools and simulation techniques using process oriented or discrete event simulation packages and tools are leveraged. One such simulation tool is SIM QPN which simulates Queuing Petri nets formalism built on top of generic discrete event simulation package. We modeled an application built with composite business services for performance using both LQN and SIM QPN tools. Our results are analyzed and presented in this paper.

Standard logging facilities such as log4j and syslog are very useful, but unfortunately lack many of the facilities required during problem determination. Agile development organizations and DevOps groups need turnkey, quick to deploy performance and diagnostics capabilities in order to improve application quality and response time. Logging would be a promising a way to accomplish this; however, many logging tools are lacking in some of the key requirements necessary.
Logging is the most common way to trace, debug and troubleshoot applications but carries with it numerous issues. A lack of structure can make it difficult to relate application activity messages spread across multiple applications multiples tiers. Logging can also be a heavy burden on application development as it takes a long time to correlate and is unhelpful in reproducing production problems. Inspecting and relating log entries is a tedious, manual process. There is a faster way - a method that brings intelligence and speed to logging and tracing.
In this session an expert in Java and middleware-centric architectures will discuss how to use application logging to track transactions, messages or any activity across applications for effective diagnostics and troubleshooting. Using key examples from enterprise customers, the presenter will illustrate how companies can rapidly deploy a real-time performance and diagnostics service that can track and correlate activities across composite applications. This session will also cover how to turn events into "smart events" in order to improve application quality and response time.

455 (PET): Performance Extrapolation Across Servers

Performance prediction of a multi-tiered enterprise application prior to its deployment is useful to ensure that the application meets throughput and response time SLA. These applications are load tested in a test environment for small number of users that provides application owners insights about the scalability of the application. However, load testing results depend on the size and configuration of various servers and cannot be easily extrapolated from test to production environment. We present a technique that provides performance extrapolation of the application for large number of users on any target environment. Our black-box technique neither requires detailed modeling of the application functionalities nor does it require any architectural simulation for the target platform. It expects only single user test be performed on the target architecture and using initial load testing results on the test environment, it extrapolates the throughput for the maximum number of users supported by the system and pinpoints the bottleneck resource. Further, it projects resource utilization information at various servers. The strategy is tested with a number of sample applications and is able to provide accuracy of about 90% in throughput and utilization metrics.

Presenter bio: Subhasri Duttagupta has received the B.S and M.S degree in computer science and engineering from Indian Institute of Technology, Bombay, India in 1991 and 1993. She completed her Ph.D in the area of sensor dnetworks from the same department in 2010. Her research interests include performance analysis of computer systems, distributed systems design and analysis, Mathematical modeling of systems and information processing in sensor networks. She is currently working as a senior scientist at TCS Innovation Labs, Tata Consultancy Services, Mumbai.

Thursday, November 7, 16:45 - 17:00

CONF: Break

Thursday, November 7, 17:00 - 18:00

There is a cost to managing multiple systems in a Sysplex through a coupling facility. This cost is also know as the Host Effect and it is the CPU cost to z/OS to making coupling facility requests. The question being answered during this presentation is "When a z/OS system is exploiting a Coupling Facility, then what is the cost to that z/OS system in terms of CPU capacity?" Many times IBM will estimate it is 3% for multiple system management, plus 0.5% for every system in the Sysplex, plus between 0% and 10% for data sharing. What is it for you? During this presentation, the concepts and calculations of the host effect will be discussed.

462 (ITSM): Laying Tracks for Better Governance - ITSM for PTC

Federal Railroad Association reports approximately 100 accidents every year due to Rail equipment failure, Highway rail grade or fatalities. As a result, section 104 of the Rail Safety Improvement Act of 2008 (RSIA)1 mandated implementation of interoperable Positive Train Control (PTC) systems by "each Class I railroad carrier and each entity providing regularly scheduled intercity or commuter rail passenger transportation". PTC systems are aimed at avoiding train accidents by enabling quicker decision making through absolute control over the system.
However, due to complexity of PTC systems and its inter-dependencies, there is a dire need for a comprehensive governance framework for management of PTC systems. ITSM frameworks like ITIL v3 and CoBIT provide required template and guidelines for preparation of the governance framework.
The speakers would share their experience on the implementation of a governance framework for PTC systems management.

Carl De Pasquale (College of Staten Island - City University of New York & ADP, USA)

CP

How Many Virtual Machines (VMs) can be supported with a cluster? The number depends on the aggregate cluster resources (CPU/Memory), VM usage of cluster resources and the expected VM growth rate. This paper discusses two virtual capacity management approaches. The first approach is an allocation strategy where cluster usage is defined as VM configured resources. The second strategy, a demand approach defines cluster usage as VM resource consumption. Both approaches are effective at managing virtual capacity, trending VM growth and projecting cluster saturation (replenishment timeframes). As will be seen demand usage is often less than configured and subsequently by implementing a demand strategy, the recommended approach, a greater VM density (than the allocation strategy) can be realized without negatively impacting VM performance.

464 (ITSM): IT's All About Teamwork

Until recently we had business areas and IT functions both thought they were the most important and both thought they knew how to do the others job best. Times have changed we are all the business and our thought processes need to change and catch up with the world around us especially in the complex insourced, off shored, outsourced, near shored, right shored, whatever shored worlds that we all work in.
It's time to let go forget what we were and look to what we can be. This paper will look at the challenges we face in today's market and look at some simple things we can change. The ways we can build a team mentality, build trust and work as one organisation. There are things that we can do, things that we need to start doing and most importantly things that we should stop doing if we are to ensure our organisations really thrive and deliver quality services to their customers.
When people ask you what you do, what do you say? Do you focus on the technology you support the ITIL process you work in. Then do people still wonder what you do wonder what a problem manager or a wintel engineer does. Well there is another way, a way that the IT community needs to grasp with both hands if its truly going to join the 21st century

Presenter bio: Malcolm's background is in retail and commercial banking moving into IT Service Management in 1998. Since that move he has managed teams in a variety of areas data quality, incident, problem, and change management, before deciding it was much better to stop things breaking rather than fix the failures and moved into availability management.
He is an Industry recognized SME who's presented at IT conferences in the U.K and USA on management information and process implementation. With his background in client facing finance roles it is no surprise that Malcolm has developed a partnership approach between the business and the IT areas. This approach encourages all parties to focus on putting the organisations customers at the centre of all initiatives, something that the IT community and even occasionally the business forget. He has found that this approach ensures initiatives deliver effective usable solutions for the organisation as a whole.

465 (CONF): Late Breaking: TBD

Late Breaking

Room: Aventine F

Thursday, November 7, 18:00 - 19:00

CONF: BOFS

See BOFS Sign Up Sheets at Session Control for details.

Thursday, November 7, 19:00 - 19:30

CONF: First Time Attendees Reception

Thursday, November 7, 19:30 - 22:30

CONF: PARS

Includes dinner and entertainment.

Friday, November 8

Friday, November 8, 08:00 - 12:00

This workshop introduces a set of patterns and practices essential to developing responsive and highly scalable web-based applications. While it focuses mainly on web-based applications written for Windows, the performance engineering approach recommended can also be applied successfully on other platforms. These practices include defining performance goals, implementing quality assurance procedures to ensure that those goals are being met, and defining appropriate instrumentation to monitor the performance of the delivered application. It discusses the use of measurement tools and modeling to deal effectively with application performance throughout the software development life cycle - from design through development and testing to production deployment and maintenance. How conceptual models of application performance, particularly the YSlow paradigm, have influenced the design of actual measurement procedures and tools is emphasized. Examples that illustrate the topics under discussion are drawn from a case study where these software engineering principles were put into practice.

502 (CONF): z/OS Measurement Update - Overview, Reporting and Usage

Peter Enrico

Room: Aventine C

The latest releases of z/OS, as well as the newest zEC12 processors, provide z/OS performance analysts with a variety of new and updated measurements. During this forum session, come hear Peter Enrico provide an overview of a hodgepodge of these new and updated measurements, as well as examples of measurement data mining, reporting, and usage. Measurements discussed will include some of the latest measurements related to processor utilizations (PR/SM, CP, zIIP, zAAP CPU times), processor cache counters, I/O, WLM, SMF 30 measurements, WebSphere Application Server, and other measurements of interest. The objective of this technical forum session is to enable attendees to head back to work, roll up their sleeves, and learn more about the performance of their z/OS environments with a new array of z/OS performance measurements.

504 (APM): Model-Based Performance Engineering

Dr. Connie Smith

Room: Aventine E

Model Driven Engineering (MDE) is an approach to software and system development based on models and transformations among them. Developers represent designs with models such as Unified Modeling Language (UML) or Systems Modeling Language (SysML), analyze those designs, and may ultimately generate code from them. Software Performance Engineering (SPE) is based on performance models of systems. This workshop introduces the end-to-end approach for modeling performance of software and systems based on transformations between software and system models and performance analysis models. We review:

• The relevant subset of UML/SysML models and MARTE, the Modeling and Analysis of Real-Time Embedded systems (MARTE) profile for specification, design, verification/validation, and analysis of systems
• The information requirements and the technology used for Software Performance Engineering models
• The model interchange framework that connects design and performance models, the specification of model studies and the results desired
• Examples and a case study of an end-to-end performance assessment

505 (ITSM): Software Performance Engineering Maturity Model

Kevin Mobley, The Ian Thomas Group LLC.

Room: Aventine F

This Forum will focus on adapting a Software Performance Engineering (SPE) Maturity Model for a particular client. The author will present how the model was used to identify key indicators the client can use to improve the client's overall software performance while improving the client's efficiency and financial performance. Key questions asked, and answered, are: Will the client deliver improved software using SPE? Will the client get the most out of its' digital technology for the financial investment? To have technically efficient digital solutions over a long lifetime, what is required for the client to achieve these goals?