I apologize for the lack of postings lately, but the APM marketplace is full of news and I’ve been off covering it in a number of new papers posted at: www.emausa.com. I also took the time to write an article for APM Digest entitled “Why Your APM Solutions may not be Cloud Ready”, available at: www.apmdigest.com/why-your-apm-solutions-may-not-be-cloud-ready. In that article, I discussed some of the ways that hybrid Cloud transactions– those spanning multiple hops, on- and off-premise– are “breaking” traditional APM paradigms.

My coverage of hybrid Cloud management has been ongoing for several years. The latest research indicates that approximately 50% of companies are already running transactions that span on and off-premise hosting. At the same time, there are few good answers to the problem of monitoring and troubleshooting these complex environments.

I recently heard about an IT organization that has deployed 2,000 hop transactions—yes, Virginia, a single transaction that traverses 2,000 hops. These complex environments—which I call “FrankenApps”– break traditional deep dive application management toolsets for two primary reasons (other than sheer scale). One is a lack of APIs and instrumentation on the side of the Cloud provider. The other is that most APM vendors haven’t invested the resources necessary to connect to APIs for those providers which do supply them. And without visibility to off-premise tiers, APM toolsets cannot support deep-dive troubleshooting and root cause diagnosis in these complex environments.

If you remember, Dr. Frankenstein’s created his “manlike” invention from bits and pieces of bodies, wire, and string he picked up on his gruesome nightly forays. He brought the entity to life with lightning, and his creation functioned more or less as a viable living creature. However the outcome was less than successful. The monster took on a life of its own, spiraled out of control, and wrought general havoc.

The similarities to FrankenApps are obvious. They can span multiple bits and pieces of hardware and software, from back office mainframes to virtualized database and application servers, web technology, hosted SOA components, IaaS, PaaS, and/or SaaS, endpoints, etc. When orchestrated and integrated, they resemble an “applicationlike” entity which, like Frankenstein, is very likely to get out of control.

In fact, the dirty little secret about public Cloud “ease of use” is that most such environments introduce significant monitoring and management challenges. Some, but not all, IaaS vendors, for example, provide monitoring tools and/or APIs. Amazon and Rackspace do, however this is not the case for all such providers. Fewer SaaS and PaaS vendors do so, although more mature companies such as OpSource have done so for years.

Even for public Cloud providers which do deliver such visibility, it is highly variable in terms of granularity and accessibility. Typically, where such capabilities do exist, they are self-contained, stand-alone solutions delivering variable levels of visibility to the outside world via APIs.

The problem is exacerbated by software vendors, whose Application Performance Management tools may or may not integrate with these APIs. The result is that many existing toolsets have minimal visibility to these environments.

While synthetic transactions and end user experience monitoring tools are good options for determining high-level, end-to-end performance and availability, as stand-alone technology they lack the granularity necessary for troubleshooting public Cloud tiers. Most IT organizations I have spoken to, for example, tell me they are leveraging synthetic transactions for high-level performance metrics, but manually correlate performance across monitoring stations and/or geography for troubleshooting purposes.

From an industry perspective, we need two things immediately:

• Cloud service providers (virtualization, IaaS, PaaS, and SaaS vendors) must deliver “hooks” that provide monitoring metrics to traditional APM solutions. At minimum, the tools need to be able to “see” the transaction as it enters and exits to quantify time spent and verify completion. Ideally, error messages and even payload contents should be visible as well. Such information makes it possible to isolate performance problems in multi-hop transactions to a single tier or set of tiers.

• Cloud APM vendors must develop the partnerships and “hooks” necessary to incorporate provider metrics into existing monitoring systems. Most vendors I speak to are notably lacking these partnerships, which means they lack visibility to Cloud provider metrics. This is aggravated by the fact that there is no common cross-vendor API/protocol standard such as network management solutions have in SNMP. IBM is promoting the Open Services for Lifecycle Collaboration (OSLC) standard (see open-services.net). As of this writing, however, the member list includes only two major APM vendors: IBM and Oracle. While such a standard is an excellent longer-term option, history has proven that such standards take a long time to develop. In the short term, APM tools vendors need to integrate the old fashioned way—by developing relationships and creating adapters for Cloud provider APIs.

I will be writing about this topic throughout the year, as I truly believe we as an industry finally need to step up to the plate with a better answer. Heck, as a short-term stopgap, I’d even be willing to extend SNMP to applications—it is already the de-facto monitoring standard for everything from routers to servers to toasters (well, maybe not toasters…). It is also easy to embed in software systems (a command line is all you need), exists now, and would at least provide SOME level of information until application-specific standards can be developed—a process which, regrettably, could take years.

Meanwhile, Cloud IT customers are the ones that pay the price, since they get the phone call from users when performance goes south. In the end, CIOs have the power to vote with their pocketbooks. Requirements for manageability APIs, and for API support, should be part of the RFP process, and customers should buy public Cloud, virtualization, and APM solutions accordingly.

I would love to hear from Cloud providers and APM vendors who are addressing these concerns. Feel free to respond via a comment to this blog post or at: jcraig@enterprisemanagement.com.

Julie Craig has over 20 years of experience in software development and enterprise systems management. Julie has a Master's Degree in Computer Information Systems, with emphasis areas in Object Oriented technologies and Enterprise Architecture. At EMA, Julie’s focus is on configuration management and application performance.

2 Comments

[…] Craig, who tracks the APM marketplace over at Enterprise Management Associates (EMA), has been writing about this issue with some regularity. She has coined the term FrankenApps for the new, highly complex environments […]

Julie, You introduce an excellent metaphor and we at Appnomic Systems(www.appnomic.com) concur with your perspective. Your message resonates with our customers and partners including The Open Data Center Alliance (ODCA) and the SIIA, two leading industry organizations driving standards to effectively evolve and tame this new FrankenApp monster. In addition to the sprawl of cloud resources and APIs that constitute today’s FrankenApps, multiple service providers underlie all these operational components. At Appnomic, like Nicholas Carr – the author of “The Big Switch: Rewiring the World from Edison to Google,” we believe many data center resources will consolidate and be offered by IT “utility” type companies. The service providers and utilities will also evolve to support smooth application operations.

Appnomic works with enterprises, and the service providers who support them, to provide consistent measurement of end-user experience — measuring transaction performance response times and the correlated underlying infrastructure component performance. These measurements are used to model and actually predict imminent performance problems. Wonderful possibilities emerge from this unique approach – new application and business revenue generating optimizations, better capacity planning, and completely new approaches to application testing, to name a few. It may sound like science fiction, but we regularly see it happen with customers implementing the next generation of APM in complex hybrid environments. We believe that service providers will embrace the need to move up the stack and take accountability for how infrastructure operations are key factors in application performance — hopefully, they will help keep these FrankenApps under control! In the meantime, we’re succeeding by taming these apps with the help of industry experts and the user community of enterprise and Cloud IT professionals.