Discussions

BEA and Mercury have announced a strategic partnership, a result of which is bundling the Mercury Diagnostics Profiler along with Weblogic 9.1. The Mercury Diagnostics Profiler is geared primarily towards developer use, but is the same technology which can be bundled with LoadRunner or Business Availability Center to identify Java and .Net performance and memory issues in your test and production environments.

The profiler has features on par with some of the other profilers on the market (JProbe, YourKit, etc.), but also touts one of the lowest overheads for any profiler, since this is the same technology used in their high-performing production monitoring software. Some other features include: performance hotspots, heap dumps including object sizes, sql tracing, argument capture, top 3 slowest instances, synchronization diagnostics (debugging thread issues) and collection growth statistics (for tracking leaky collections).

You know, it's funny to think about now, but I remember back when J2EE containers were first coming out, much of the promise was how the container vendors would provide all these code optimization heuristics and diagnostics information so that you could easily solve issues once your app was deployed. Somehow that never came to pass.

Well it is looking at everything in your class loader, VM etc. so it has to be slow. Thats when it will give you all important details of what going on. It will point out how many unnecessary instances are hanging around for no reason etc also. This is very useful in development phase. Also when you need to trouble shoot a weird problem in production like memory leak or some code behaving bad after some time in production. Also help solve a lot of unnecessary restarts of the app servers by finding out what wrong on the internals. Its a good to have bundled in an app server. Otherwise configuring a probe tool on an app server externally is painful. I wish all app servers come with some tool like this.

With any kind of performance monitoring tool, you're usually going to take some performance hit. However, the size of the hit is really what needs to be addressed. If you try to instrument your entire application, then each method is going to be increased by some constant (say 3 bytecode instructions). If you add 3 byte code instructions to a get…() method, then that’s a lot of overhead (what was 1 bytecode instruction is now 4). However, if you add 3 bytecodes to a 100 bytecode method, then the performance hit is much lower. This is a bit of a simplistic example, but you get the idea.To answer your question about using the probe in production – it is designed for production use since the probe doesn’t use any of the heavy-weight JVMPI calls (a reason why you can’t run a standard profiler in production). This version is crippled in that it doesn’t allow monitoring of more than a few application threads (usually enough in your desktop environment), but successfully runs in production environments. If you were to factor in the amount of time spent in logging code, then I would be willing to bet that this has less overhead than most of that code, no matter what your application.We should also be clear about what is termed "overhead". By this I mean both latency _and_ throughput. If you measure end user response time (really the metric that matters most to us), then the latency with the probe installed is near zero. I am planning on putting together a test for some of these production monitoring probes in the near future, so stay tuned. That should help answer some of your concerns here.

This is not completely true. Yes, there are some aspects of JVMPI which are extremely costly especially in the area of heap analysis. For call stack capturing, thread monitor contention and GC events the overhead is minimal for any real-world JEE application.

If the Mecury Probe does not make any JVMPI then it cannot report on the impact of GC during processing of a request, transaction or method invocation thus in any production environment the figures reported by Mecury Diagnostics are subject to LARGE inaccuracies.

A developer cannot code around a GC event as it is a JVM runtime evironment issue and more than likely the result of the execution of other concurrent requests, transaction or invocations.

Benchmarking and Garbage Collectionhttp://www.jinspired.com/products/jdbinsight/callstackbenchmark.htmlThis Insight article highlights the importance of recording garbage collection times for transaction clock time reporting in Java application performance management solutions. The performance costs of temporary object allocations are also discussed.

By the way can you state the pricing for the non-crippled version of the tool - Mecury Diagnostics for J2EE.

If the Mecury Probe does not make any JVMPI then it cannot report on the impact of GC during processing of a request, transaction or method invocation thus in any production environment the figures reported by Mecury Diagnostics are subject to LARGE inaccuracies.

Let's be clear…. What you're suggesting is adding additional overhead so that you can trace the amount of time spent in garbage collection versus what time is spent in user code. As you know, many times it depends on the type of problem you are trying to address, and for your additional GC statistics, that can be very useful if you are trying to trace down a problem that is occurring when too much GC is taking place (usually due to over-allocation of objects, either by the JVM itself as you note in your study, or also due to user objects.) In this case, there is more than one way to skin a cat. The ability to analyze over-allocation of user objects can be accomplished both by tracking GC time (as you do) or having the ability to view the total # of user-allocated objects within a given timeframe. The Mercury probe has the capability to do that latter, though that feature is not available in the free version.I would also argue that most of the end user performance problems that they can reasonably deal with are contained in their own code rather than just the JVM, so while I see this as a useful feature, it only addresses only one particular problem case (with increased performance penalty from the additional JVMPI calls). If you’re truly interested in more detailed statistics on GC impact, then using something like JVMStat or BEA’s Mission Control Tools.Mercury doesn't state prices on their website, though if you already have LoadRunner in house, the probe is a nice add on to capture high-detail Java data.

You might want to revisit your statements again after you have looked at the JVMPI documentation surrounding GC events and started using a tool that does report on GC times per request, transaction, method invocation, or SQL execution.

Honestly do you think a GC event callback (ever 1-2 seconds) is more heavy weight than instrumentation of various collection and monitoring of their reachability.

You cannot see the importance of GC times at this moment because you have never seen such metrics reported alongside clock time, cpu time, thread monitor waiting and blocking. Event the execution of a SQL statement (which is typically database bound) is subject to outliners caused not by the database query execution engine but the fact that a GC event occurred during its execution stopping the reading of the resultset from the network stream.

Clay: "In this case, there is more than one way to skin a cat. The ability to analyze over-allocation of user objects can be accomplished both by tracking GC time (as you do) or having the ability to view the total # of user-allocated objects within a given timeframe."

With JXInsight every clock time recorded has the amount of GC time (typical JVM pause time) so that I can discount time caused by excessive allocation elsewhere in the JVM. GC events happen irrespective of how poor the code is in regard to temporary object allocation. We had a customer that mixed long running transactions 10-50 seconds with short transactions <1 seconds. The short transactions occasionally had times of 5-8 seconds which was unacceptable for this type of operation. The problem was the large number of objects created during the long running transaction induced longer GC periods. There was no way to reduce the long running transaction object allocation costs because this was the nature of the transaction. The solution was partially temporal and distributed.

Mecury's solution would have developers and DBA running after SQL execution problems that simply do not exist

Clay: "I would also argue that most of the end user performance problems that they can reasonably deal with are contained in their own code rather than just the JVM"

Are you saying that the actual execution time of the bytecode of the J*EE application is the bottleneck? I rarely see this to be the case. It comes down to the resource usage: databases, message queues, transaction managers, JVM memory, GC, client-to-server roundtrips....

Clay: "Mercury doesn't state prices on their website"

Can you shed some light on this since you are advocating this product?

…started using a tool that does report on GC times per request, transaction, method invocation, or SQL execution

I have used a lot of performance tools out there, and I agree with you that GC time can be useful in solving some memory thrashing-related issues, however, as a general-purpose diagnostic feature, it is not the highest one on my list. Also – you’re assuming that I haven’t evaluated your tool. There are some features that I think are very useful in JXInsight, but it is not what I use primarily for performance tuning.

Honestly do you think a GC event callback (ever 1-2 seconds) is more heavy weight than instrumentation of various collection and monitoring of their reachability.

So defensive… no need to make an attack here William. I’ve obviously struck a nerve if you feel as though you need to put accusatory statements in bold typeface. Just trying to educate the market a little bit, here. If you want to have this feature-for-feature pissing match, then we can continue, but in the end, decisions like these often come down to whether you think the tool can do the job, the amount and quality of support, price, how well it integrates with 3rd party tools, ease of installation, etc. Anyone who’s been through an RFP process knows there is usually a laundry list of features that are needed for production monitoring and diagnostics which is not limited to a single platform type. If you’re considering more of a point solution, you’re limited on the level of integrations. If you’re looking for something that can monitor and diagnose issues no matter what the platform – SAP, Citrix, Oracle, MQ Series, J*EE, DotNet, etc. along with the ability to integrate with existing tools, then you’ll probably want to consider something more in lines with what Mercury offers.

(Pissing match, continued…)Your assumption here is that GC’s only occur every 1-2 seconds. Your example does a good job of explaining the nature of this type of issue – you experience some behavior in one part of your application that impacts another (seemingly unrelated) part of your application. To track down the issue, you have to determine the behavior that is causing the issue. You could either view the large GC collection times, or you can view the large # of object allocations. GC times won’t tell you where the problem is occurring, but if you are tracking when and where the object is allocated, then you can see the root cause of the problem. Also – being able to simulate the problem in a test environment is especially helpful, and since Mercury already leads the performance test tool market, then it’s a no-brainer to add the J2EE Diagnostics plugin to LoadRunner to get more fine-grained Java performance data along with the artificial load produced by LoadRunner.Also note that when you have a memory thrashing condition (as we’ve discussed thusfar), typically the # of GC events (both major and minor) increases along with the # of object allocations, which would actually cause the GC event collector to inhibit the processor’s ability to perform the GC’s necessary for the cleanup. I still like the idea of integrating with the GC cycles, but it would be nice to be able to turn that feature on and off without needing to reboot the server. That way, you could still track the GC’s up until the point at which the GC monitoring itself gets in the way of the system’s ability to actually collect the garbage.

Are you saying that the actual execution time of the bytecode of the J*EE application is the bottleneck? I rarely see this to be the case. It comes down to the resource usage: databases, message queues, transaction managers, JVM memory, GC, client-to-server roundtrips....

I definitely agree with you here and this is something that I think this is one positive takeaway from this thread. Users should be paying attention to what’s happening with both internal and external resource usage. Spending time in the JVM is useful, but being able to correlate that with what is happening in the database, your MQ server, network traffic, disk space, kernel parameters, etc., is key to solving many performance issues. Having a tool that monitors resources both internal and external to the JVM helps greatly here.

Mecury's solution would have developers and DBA running after SQL execution problems that simply do not exist

This doesn’t make sense in the context of what you stated before, nor is it true. Please try to keep the accusations to a minimum here without your having much founding evidence to support your claim.

Can you shed some light on this since you are advocating this product?

I am a performance consultant who specializes in Java and DotNet. Many of my customers require recommendations for performance monitoring and diagnostics tools. I have partnered with Mercury since I have to offer more than just single point solutions. Many of my customers have heterogeneous environments and need more than just Java and DotNet performance monitoring, and I found that Mercury’s stack covers me in both breadth and depth. I don’t offer up prices because I bundle both products and services which are tailored to a particular company’s needs.

I am making sure that others reading this thread get a true picture of what is being offered as well as refuting such board statements such as 'JVMPI has too much overhead'.

Your statements continue (even afetr correction) to imply the Mecury solution has less overhead than JVMPI when in a large amount of the production management and monitoring features can be implemented with better engineering in designing and coding a JVMPI (or JVMTI) agent. It was you that started trashing the API and all I did was attempt to inform you that this was not the case. Yes there are problems with the API but who is to say there are not problems with a vendors own properitary profiling agent.

Clay: "Your assumption here is that GC’s only occur every 1-2 seconds."

I never made any assumptions. I have seen customer sites experience a 20-30 second GC pause after a long spell of high workload. No one can state the nature of the GC event lifecycle it is dependent on too many factors. That is why it is important to make sure that before a developer rushes off to a DBA reporting a 20 second SQL response time that he has already checked whether a GC cycle did not perturbed the measurement. A SQL response time can also be impacted by thread monitor contention within the JDBC driver that is why JXInsight provides a clock adjusted time (minus GC and thread monitor contention) as well as Service Times.

Hi Clay,I am making sure that others reading this thread get a true picture of what is being offered as well as refuting such board statements such as 'JVMPI has too much overhead'.

William – yes, I admit that I was being a bit simplistic with my statement about JVMPI, however, if you look at the response, it was made to the individual who made the claim:

When I used probes, it always takes a lot of perfromance hit when you turned it on. So you cannot use it for Production.

As you probably know, there is a lot of misconception around using a tool such as this in a production environment due to the (unfortunate) history of the JVMPI interfaces and their large performance impact if they are used inappropriately. This has given many of the high-performing production monitoring tools (such as Mercury, Wily, Veritas, Quest and JInspired) a tough spot to work out of, since for many people, they tend to believe that profiler=JVMPI=slowness=not production ready. As you mention, albeit from an exceedingly accusatory standpoint, this is not the case, and I agree with you here. However, trying to explain the difference between a profiler and a production monitoring tool is something that I’ve struggled with explaining for a while and in order to get my point across, I use JVMPI as an example, since that tends to resonate with those individuals who use profilers in their development environments.At this point, I’m guessing that most readers have already stopped reading this thread because it has taken the path that so many TSS articles have lately, which is one of bickering and one-upsmanship. Rather than allowing this to be a forum for discussion of the differences between production monitoring tools and profilers, your bold-faced typing and negative attitude have made this all rather dull.

I should also point out that my original statement was:

production use since the probe doesn’t use any of the heavy-weight JVMPI calls (a reason why you can’t run a standard profiler in production).

This was intended to imply that there are lightweight JVMPI calls and heavyweight JVMPI calls. I have yet to find any text that backs your claim that I “started trashing the API”. I simply mentioned that JVMPI calls can be heavy-weight and that you wouldn’t want to run your typical desktop profiler in production.

Yes, I agree with you that the feature you’ve implemented has some value for customers. However, there are also some features that Mercury’s products have which would work as good or better in other situations. I’m hoping to clear up some of these misconceptions by educating the market a bit more on these topics. If I can elicit your help in doing this then we’ll all be better off and you won’t have to continue to make unfounded accusations.

One last thing, you mention that:

Yes there are problems with the API but who is to say there are not problems with a vendors own properitary profiling agent.

I would kindly ask that you do the same here and knock off the accusatory and product-bashing. I have just as much ammunition for this as you do on this, so let’s work on the maturity level here a bit and try a 3rd road instead.

How about you take this one?...

Well it is looking at everything in your class loader, VM etc. so it has to be slow.

Clay: "If you measure end user response time (really the metric that matters most to us), then the latency with the probe installed is near zero. I am planning on putting together a test for some of these production monitoring probes in the near future, so stay tuned."

When you do report these figures can you please at least attempt to fully load the system if multiple processors are being used and how many interceptions points occured during the transaction processing alongside all execution points as many performance management tools claim zero overhead but in fact using a event queue system to defer the heavy profiling post the execution of the transaction and more than likely on an additional processor. This makes it very easy to show zero overhead on a lightly loaded system - measuring the response time of a simple and single threaded test execution.

The Mecury and JRockit JVM tools do not offer true resource transaction analysis and pattern detection which is provided today by only one APM solution on the market - JXInsight.

Detecting and understanding transactional database access patterns is well accepted as the biggest and most important challenge for developers, architects, testers, administrators and operations staff when performance problems are reported against production applications.

Most performance problems that developers cannot be resolve with simple thread stack dumps or system println calls within a day of notifcation, require production monitoring solutions that detect concurrency and workload problems. This is what vendors such as JBoss and WebLogic need to start offering via quality prouct integrations and bundling.

It is also important that solutions work across platforms and runtimes as many organizations do not standardize on a single JEE vendor or JVM Runtime ((Sun, BEA, IBM) X (1.3,1.4,1.5).

I would have to agree. The performance hit you take is dependent on the amount of info you wish to collect. There is no magical free ride here. Doing in-depth code profiling is going to have a cost that you probably can't endure in production. Furthermore there's a danger that the profiler will add instrumentation to even trivial methods (like getters and setters) that will result in high overhead for little gain. This is why production monitoring is typically done by monitoring tools, not code profilers.

If one is thinking of using this profiler in production make sure you read the SLA careful as it states:

"this Agreement grants you a nontransferable and non-exclusive license to use, solely for Your internal business purposes in a pre-production, non-staging, non-mission critical development environment for diagnosing application performance problems"

Furthermore, having a 5 thread max would also make it hard to re-create production level problems in a dev environment.

I would also like to add that to work well in a BEA environment a code diagnostic tool has to be able to support complex SOA and JEE applications whose transactions routinely cross multiple tiers and JVMs. Instrumenting a single VM on a single tier offers little to no visibility into many production issues .

Doing in-depth code profiling is going to have a cost that you probably can't endure in production.

True - you wouldn't want to used the bundled "profiler" in production. However, the version of the probe that Mercury offers for production monitoring will collect relevant data that can be used to track down issues, with relatively little performance hit. For detailed profiling and problem recreation, the probe plugin to LoadRunner is probably you're best bet. There is always a tradeoff with these tools, so it is up to the user to know how to use them to solve their issues.To do this, you have to cast a wide net, say be instrumenting only the critical layers - Servlet/JSP, EJB, JDBC, etc. and then as you spot issues you need the ability to turn off instrumentation for the other layers while you instrument sub-layers of each of the above-mentioned APIs. The nice thing about the Mercury probe is that you can instrument your appliction before putting it into production and selectively collect data based off of what you'd like to drill into. This prevents the performance-related issues with instrumeting and collecting "all data all the time".The advantage of offering this for free is that you can use the same probe in development, testing and production with only a few tweaks to the probe's configuration. That ensures that you're testing the probe performance all through the development cycle, and if you end up over-instrumenting, that should be caught early on anyway.

I would also like to add that to work well in a BEA environment a code diagnostic tool has to be able to support complex SOA and JEE applications whose transactions routinely cross multiple tiers and JVMs. Instrumenting a single VM on a single tier offers little to no visibility into many production issues .

Good point. The nice thing about the Mercury probe is that it uses some of the same technology in LoadRunner to generate synthetic transactions through your system and trace those transactions across HTTP requests. It also performs cross-JVM tracing as well, so that you can correlate transactions that hit your Servlet container and call back to your EJB container.They have Web-Services tracing capability as well, and I believe they're adding Java to DotNet web service tracing soon. They also have a DotNet probe in case you have a more heterogenous environment.

I would also recommend taking a look at WLS 9.x Diagnostics Framework. WLDF integrates all diagnostics features and functionality into a centralized, unified framework that enables you to create, collect, analyze, and archive diagnostic data generated by a running server and the applications deployed within its containers. This data provides insight into the run-time performance of servers and applications and enables you to isolate and diagnose faults when they occur. For more information check out http://e-docs.bea.com/wls/docs90/wldf_understanding/intro.html. BEA also bundled the WLDF Console Extension that is available on dev2dev for WLS 9.0 (https://codesamples.projects.dev2dev.bea.com/servlets/Scarab?id=s96) with WLS 9.1.

TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations technology projects - with its network of technology-specific websites, events and online magazines.