Main

Tag Archives: applications

Nearly every modern application relies on databases for data persistence. The database access layer is very often responsible for serious performance problems. In the case of database problems most people start searching in the database itself. Appropriate indexes and database structures are vital for achieving adequate performance. Often, however, the application layer is responsible for poor performance or scalability problems.

The application layer controls and drives database access. Problems at this layer cannot be compensated in the database itself. Therefore the design of adequate data-access logic is vital for achieving performance and scalability. While there a nearly endless different use cases for database-driven applications, the problems can be nailed down to a small set of anti-patterns. Analyzing whether your application implements the following anti-patterns and resolving them will help to easily implement faster and more scalable software with minimal additional effort.

Misuse of O/R Mappers

O/R mappers have become a central part in modern database applications. O/R mappers take away the burden of translating and accessing relational data from object-oriented software. They hide great parts of the complexity of data access from the application programmer. This results in higher productivity as the developer can concentrate on the actual application logic rather than infrastructural details. Complex data graphs can be easily navigated at the object-relational layer without seeing what is going on under the hood. This often creates the wrong impression that these frameworks take away the burden of designing data-access logic.

Often developers think that their data-access framework will simply do things right; however, using O/R mapping frameworks without understanding their inner workings in many cases results in poor application performance. There are two central misunderstandings that cause these problems – loading behavior and load time.

O/R mappers load data on a per-object base. This means when an object is requested or accessed the necessary SQL statements are created and executed. This principle is very generic and at first sight works well in most situations. At the same time it is very often the source of performance and scalability problems.

Let’s take a simple example. In a database for storing address information, we have one table for persons and one for addresses. If we want to get the name for each person and the city they live in, we have to iterate over the persons and then access the address information. The image below shows the result if the out-of-the box query mechanisms are used. This simple use case results in a high number of database queries.

This directly brings up the second important detail of O/R mappers – load time. O/R mappers – if not told otherwise – try to load data as late as possible. This behaviour is referred to as lazy loading. Lazy loading ensures that data is loaded as late as possible with the goal to perform as few database queries as possible and at the same time avoid unnecessary creation of objects. While this approach is generally feasible, it may result in serious performance problems and so-called LazyLoadingExceptions on accessing data that has not been loaded when no database connection is present.

In situations like the one described above data loading and at the same time performance can be significantly improved by using specialized data queries.

So while O/R mappers can be of great help in the development of data access they still leave the burden of designing proper data access logic. Dynamic architecture validation with tools such as dynaTrace can be of great help here to identify performance weak points in the application and proactively resolve them.

Load More Data Then Needed

Another anti-pattern in database access that can often be found is that much more data is loaded that actually needed. There are a number of reasons for this. Rapid Application development tools provide easy ways of linking data structures to user interface controls. As the data layer is built of domain objects, they very often contain much more data than is actually visualized. The example uses the address book scenario again. This time the names of the persons and their home cities are visualized. Instead of just loading these three items both objects – addresses and persons – are loaded. This results in massive overhead at the database, network and application level. The usage of specific queries can help to massively reduce the amount of queried data. This performance improvement however comes along with additional effort for maintenance. Adding a new column to the table might require several changes to the data access layer.

This anti-pattern can also be found very often in case of improperly designed service interfaces. Service interfaces are often designed to be generic enough to support a large number of use cases. This has the advantage that services have small contracts which can be used in a wide variety of use cases. Additionally uses cases change faster than the backend service implementations. This can result in services interfaces being inappropriate for certain scenarios. Developers will then have to use workarounds which might result in highly inefficient data access logic. This problem often arises in data-driven Web Services.

In order to overcome these problems data access patterns should be continuously analyzed during development. In the case of agile development approaches, data access logic should be checked for each finished user story. Additionally data access patterns should also be analyzed across application use cases to understand data access logic to be able to optimize data access logic according during development.

Inadequate Usage of Resources

Databases are a bottleneck for resources in applications, so they should be used as little as possible. Very often too little attention is paid to the usage of database connections. As with any shared resource such connections massively affect overall system performance. Specifically, web applications and applications using O/R mapping frameworks with lazy initialization tend to keep database connections longer than needed. Connections are acquired at the beginning of processing and kept until rendering is finished or no further data access is required. In applications using O/R mappers, they are often kept to avoid nasty lazy initialization problems. By redesigning data access logic and separating it from post-processing (like rendering), the performance and scalability of an application can be dramatically improved.

The chart below shows the response time of ten concurrent data processing threads. In the first part one database connection is used. In the second scenario ten connections are used. In the third scenario two database connections are used but two thirds of the processing is performed after having returned the connection. With better designed data access the third scenario nearly achieves the same performance with a tenth of the resources.

One Bunch of Everything

One Bunch of Everything is an anti-pattern that can generally be observed in development but even more in agile teams. The characteristic of this anti-pattern is that primarily features are developed and all data access is treated equally, as if there would not be any differences. However treating different types of data and queries differently can significantly improve application performance and scalability.

Data should be analyzed regarding its lifetime characteristics. How often does it change or if it is modified or only read. Access frequency of data, together with access patterns, provides hints on potential sources for caching. Access frequency also provides hints as to where optimizations make the most sense. This avoids premature and unnecessary optimization and guarantees the highest impact of performance tuning.

Analyzing usage patterns of data also helps to tune the data access layer. Understanding which data is really used helps to optimize loading strategies. Understanding how users browse search results, for example, helps to optimize fetch sizes. Knowing whether users look at order details helps to select lazy or eager loading for order positions.

In addition to data, queries should also be analyzed and categorized. Important factors are query duration, execution frequency and whether they are used in an interactive user context or batch-processing scenario. Transactional characteristics further help to fine tune isolation levels of queries.

Running short-running interactive queries of users and long-running reporting queries on the same connection for example may easily result in bad end user experience. Long-running reporting queries can greedily acquire database connections leaving end-user queries starving. Using different connection pools for different query types results in much more predicable end user performance. Softening isolation level on database queries where they are not required can also lead to significantly improved performance and scalability.

Bad Testing

Finally, missing or improper testing is one of the major reasons for performance and scalability problems in database-accessing applications. I recently gave a talk on this topic and asked the audience whether they see database access as a performance problem in their applications. While all of them agreed, nobody had testing procedures in place to test data access performance. So while it seems to be an important topic, people do not seem to invest in it.

However, even if testing procedures are in place, this does not necessarily mean that testing is done correctly. Although a lot of problems in data access logic can be found right after the code has been developed, testing is performed much later in the load testing phase. This introduces unnecessarily high costs as changes are performed late in the lifecycle possibly requiring architectural changes leading to additional development and testing efforts.

Furthermore test cases have to be designed to test real world data access scenarios. Data access has to be tested in a concurrent mode and using different access types. Only combined read/write access helps to identify locking and concurrency problems. Additionally adequate variation of input data is required to avoid unrealistically frequent cache hits.

Very often people also do not know for which load to test as they have no adequate information on expected load. In my experience this is very often the case – unfortunately. This, however, is not an excuse for not defining load and performance criteria. It is still better to have some criteria defined instead of not defining them at all.

In case you really have no clue on performance characteristics the best approach is to use load testing criteria with increasing load until the saturation point of the application is reached. Then you have identified the peak load of the application. If this sounds reasonable and realistic you are on a good way. Otherwise you know where you have to improve performance. In most cases initial tests show that application can cope with much less load as expected.

Conclusion

Database access is one of the most critical areas impacting performance and scalability in modern applications. While frameworks support in building data access logic, a serious amount of thought still has to be put into the design of data access logic to avoid pitfalls and problems. The key is to understand the details of the dynamics and characteristics of an application’s data-access layer.

After receiving several questions about the new Oracle 11 protocol on LoadRunner and the version 12 of the E-Business suite, I thought it would interesting to give a few tips about these technologies too. Oracle Web Applications 11i is the new protocol used by the newer LoadRunner versions to script the E-Business suite. The advantage of it is that you can use a single protocol to script the application instead of using the multi-protocol approach. This can be useful if your license is limited or you don’t want to spend more money on protocols that you don’t have. Another advantage is that this protocol can be used to script the newer version of the E-Business suite, 12. Tip 1: Correlating the Login After recording your basic script, the first thing you will have to correlate on the Login is the ICX ticket. In order to do do this, you will have to add this command right before clicking on the functionality link:

Eg.: If you accessing the “Submit Processes and Reports” functionality for example, you will have to add this code right before the “web_text_link(“Submit Processes and Reports”…” line. This parameter will be used later on the “nca_connect_server” and “frmservlet” requests. On both requests, look for the “icx_ticket=” part of the URL and replace the code to your parameter ({ICX}) You will also need to correlate another session parameter that sometimes is not correlated automatically and sometimes is wrongly correlated when you’re recording the script. Look for the “web_url(“frmservlet”…” line mentioned above and add the following block BEFORE it:

This will create the “Jsession” parameter that will be used on another call before the “nca_connect_server” call. Look for a line that starts with “web_url(“lservlet;jsessionid=” and replace on the URL parameter the large string that comes after “/lservlet;jsessionid=” with your parameter. Remember, do not replace the large session string on the title (first parameter), but on the URL parameter. Eg.:

To summarize things you will have to do 5 things. 1. Capture the ICX paramter 2. Replace the ICX parameter on the frmservlet call 3. Replace the ICX parameter on the nca_connect_server request 4. Capture the Jsession paramter 5. Replace the Jsession parameter on the lservlet;jsessionid= call This should also work if you’re scripting against an Oracle 11 application with the new LoadRunner protocol. And remember, this has worked for me, but may not work for you depending on how your application was deployed and configured.

Tip 2: Recording Sessions with HTTPS (SSL) Your application may be configured to use SSL (HTTPS connections). If this is your case, you probably won’t be able to record your scripts directly since LoadRunner is not able to identify which server you’re connecting to. To solve this issue you will have to manually add your application server address and port before recording. Click on the “Options” button on the lower left corner of the recording window (before you start recording). Select “Network > Port Mapping” on the left side pane and click on the “New Entry” button. This screen should be presented:

Target Server and Port are respectively your application server address and port. Service you can leave as “Auto Detect”. Record Type you can leave as “Proxy”. Set Connection type to “Auto”.

This will enable the SSL Versions and SSL Cyphers fields. With this fields you will have to check with your application administrator or spend some time “guessing” the correct configuration.

Click on Update and start recording the script. If everything was set correctly, LoadRunner should be recording your actions inside the application as usual.

Searching about a few issues I’m having with an Adobe Air application, I found this cool article about Performance-Tuning Adobe Air applications written by Oliver Goldman from Adobe.

Application performance is perennial. It’s in its nature. In order for an application to perform well, every part of the application has to perform well. Lapse in one area and it brings your entire application down. It’s dif?cult to write a large application without letting your guard down once in a while.

Questions about performance often indicate a failure to understand this weakest-link-in-the-chain aspect of the problem. Here are some of my favorite lousy questions about performance and AIR applications:

Will my AIR application be fast?

Is AIR fast enough to do X?

Isn’t AIR too slow to do Y?

(Here’s proof also that no matter what your kindergarten teacher told you, there is such a thing as a lousy question.)

AIR almost never makes it impossible to achieve good performance in your application. On the other hand, AIR can’t do it for you, either. Like I said, it’s the nature of the problem.

Fortunately, standard tuning techniques apply to AIR as much as they’d apply to writing any piece of desktop software.

Asking good questions

Achieving good performance starts, like most engineering problems, with understanding the problem you’re trying to solve. Here are some good questions to ask about your application:

Which operations in my application are performance sensitive?

What metric can I use to measure this sensitivity?

How can I optimize my application to that metric?

Most applications contain a lot of code that runs well enough. Don’t spend your time on that stuff, especially if any gains would be below the threshold at which users could notice them. Make sure you’re focused on things that matter.

Common examples of operations worth optimizing are:

Image, sound, and video processing

Rendering large data sets or 3D models

Searching

Responding to user input

Defining metrics

Performance is often equated with speed, but don’t be lulled into thinking that’s the only metric that matters. You may ?nd that you need to tune for memory use or battery life. Applications that minimize use of these may also be considered better performing than those that don’t. Sometimes optimizing for other metrics also speeds things up, but other times trade-offs are required.

Regardless of what you’re measuring, you must have something to measure. If you’re not measuring anything, you can’t tell whether changes improve performance or harm it. Good metrics have these three properties:

They’re quanti?able. You can measure them and record them as a number.

They’re consistent. You can measure them repeatedly and usefully compare measurements.

To make this concrete, suppose you’re writing an application that’s going to perform some image-processing tasks on a large set of images. During the processing, the application needs to display feedback on its progress to the user. It also needs to allow the user to cancel an operation, rather than waiting for it to complete. This is a simple application, but even it has at least three interesting metrics that we can examine.

Example: Throughput

The ?rst and most obvious metric is throughput. It’s meaningful, in this example, because we know we must process a large number of images. The higher the throughput, the faster that processing completes.

Throughput is easily quanti?ed as processing per unit time. Although it could be measured as the number of images processed, measuring the number of bytes can produce a more consistent value when image sizes vary. Throughput for this example is easily measured in bytes per millisecond.

Example: Memory use

A less obvious metric for this application is memory use. Memory use is not as visible a metric to end users as is throughput. Users have to run another application, such as Activity Monitor, in order to monitor memory use. But memory use can be a limiting factor: run out of memory, and your application won’t work.

Memory use is of interest in our image-processing example because the images themselves are large. We’d like to be able to process large images—even those that exceed available RAM—without running out of memory. Memory use is straightforward to measure in bytes.

Example: Response time

The ?nal metric for our sample application is one that’s often overlooked: response time to user input. This metric is immediately visible to all of your users, even if they rarely stop to measure it. It’s also pervasive. Users expect all operations—from resizing windows, to canceling an operation, to typing text, to respond immediately.

Whereas some metrics are perceived linearly by users, response time has an important threshold. Any lag in response to input over approximately 100 milliseconds is perceptible to users as slow. If your application consistently responds below this threshold, no further optimization is necessary. Clearly, this metric is easily quanti?ed in milliseconds.

Response time is a particular challenge for the image-processing application because processing any individual image will take well over 100 milliseconds. In some programming environments this is addressed by handling user input on a different thread from long-running calculations. Under the covers, this solution depends on the operating system switching thread contexts quickly enough such that the user input thread can respond in time. AIR, however, doesn’t offer an explicit threading model and so this switch must be done explicitly. This is illustrated in the next section. The following sample demonstrates three different ways of setting up image processing, optimizing for different metrics:

Taking measurements

Once you’ve identi?ed and de?ned your metrics but before you can address them, you must be able to measure them. Only by measuring and tracking your metrics before and after can you determine the impact of those changes. If possible, track all of your metrics together so you can see how changes made to optimize one metric might impact others.

Measuring throughput

Throughput can be conveniently measured programmatically. The basic pattern for measuring throughput is:

Measuring memory

Memory is a more complex subject. Most runtime environments, including AIR, don’t provide good APIs for determining an application’s memory use. Memory use is best monitored using an external tool such as Activity Monitor (Mac OS X), Task Manager (Windows), BigTop (Mac OS X), and the like. After selecting a monitoring tool, you need to determine which memory metric you want to track.

Virtual memory is the biggest number reported by tracking tools. As the name suggests, this does not measure the amount of physical RAM the process is using. It’s better thought of as the amount of memory address space the process is using. At any given time, some portion of the memory allocated to the process is typically being stored on disk instead of RAM. The amount of RAM plus space on disk taken together is often thought of as being equivalent to a process’ virtual memory, but it is possible that portions of the address space are in neither place. The details depend on the operating system and how it allocates portions of virtual memory for different purposes.

The absolute size of virtual memory your application is using, given what virtual memory encompasses, is likely not an interesting metric. Virtual memory of your application relative to other, similar applications may be of interest, but is still dif?cult to usefully compare. The most interesting aspect of virtual memory is its behavior over time: growth without bound generally indicates a memory leak. Memory leaks may not show up in other memory metrics because the leaked memory, if not referenced, gets paged to disk and then simply stays there.

The best memory metric to monitor is private bytes, which measures the amount of RAM your process is using and which is used only by your process. This metric speaks directly to the impact your application has on the overall system, courtesy of its use of a shared resource.

Private bytes will ?uctuate as your application allocates and de-allocates memory. It will also ?uctuate as your application is active or idle as, when its idle, some of its pages may be paged to disk. To track private bytes, I recommend using a monitoring tool to take periodic samples (that is, one per second) during the operations you’re optimizing.

Other memory metrics you may see in monitoring tools include resident size and shared bytes. Resident size is the total RAM use of your process, made up of private and shared bytes. Shared bytes are sections of RAM that are shared with other processes. Usually these sections contain read-only resources, such as code, from shared libraries or system frameworks. Although you can track these metrics, applications have by far the most control over—and problems with—the private bytes value.

Response time

Response time is best measured with a stopwatch. Start when the user takes an action, for example, clicking a button. Stop when the application responds, typically by changing the displayed user interface. Subtract the two and you have your measurement.

The optimization process

With goals and metrics in place you’re ready to optimize. The process itself is straightforward and should be familiar. Repeat these three steps until done:

Measure

Analyze

Modify

Broadly speaking, analysis can lead you to one of two kinds of changes: design or code.

Design changes

Design changes generally have the largest impact. They can be more dif?cult to make later in the game, however, so be sure not to wait too long before de?ning and measuring against your performance goals.

For an example, let’s return to our image-processing application. A naive implementation might load each image in its entirety into memory, process it, and then write the results back to disk. The peak memory use (private bytes) of this application is then primarily a function of the size of the loaded images. If the images exceed available RAM, the application will fail.

Few image-processing operations are global; most can be performed on one portion of an image at a time. By dividing the image into ?xed-size chunks and processing them one at a time you can limit the peak memory use of the application to a number of your choosing. This also enables processing images that are larger than available RAM.

After modifying your design, be sure to re-evaluate all of your metrics. There is always some interplay between them as designs are evolved. Those changes may not always be what you expect. When I prototyped this sample application, processing images in ?xed-size chunks did not signi?cantly alter the throughput of the application, despite my expectation that it would be slower.

Code changes

When no further design enhancements present themselves, turn to tuning your code. There are many techniques to experiment with in this arena. Some are unique to ActionScript; some are not.

Be careful not to apply code changes too early. They tend to sacri?ce readability and structure in the name of performance. This isn’t necessarily bad, but if applied too early they can reduce your ability to evolve and maintain your application. As Donald Knuth said, “premature optimization is the root of all evil.”

Purpose-built test applications

Real-world applications are often large, complex, and full of code that runs fast enough. To help focus your optimization on key operations, consider creating a test application for just that purpose.

Among other advantages, the test application provides a place to include instrumentation (that is, for measuring throughput) without requiring that you include that code in your ?nal application.

Of course, you need to validate that your optimization results still apply when your improvements are ported back to your application.

Chunking work

As mentioned earlier, the AIR runtime does not provide a mechanism for executing application code on a background thread. This is particularly problematic when attempting to maintain responsiveness during computationally intensive tasks.

Much like chunking in space can be used to optimize memory use, chunking in time can be used to break up computations into short-running segments. You can keep your application responsive by responding to user input between segments.

The following pseudo-code arranges to perform about 90 msec of work at a time before relinquishing control to the main event loop. The main event loop ensures that, for example, mouse-clicks are processed. With this timing, most user input will be processed within 100 msec, keeping the application responsive enough from the user’s point of view.

In this example, it’s important that doWork() runs for signi?cantly less time than the chunk duration in order to maintain responsiveness. To keep under 100 msec worse case, it should run for no longer than 10 msec.

Again, re-measure all metrics after adopting an approach like this. In my image-processing application, my throughput dropped by about 10% after adopting this chunking approach. On the other hand, my application was responsive within 100 msec to all user input—instead of only between images. I consider that a reasonable trade-off.

Wrapping up

Creating high-performance applications isn’t easy, but it is a problem that responds to disciplined measurement, analysis, and incremental improvement. AIR applications are not fundamentally different in this regard.

Performance is also an evolving target. Not only does each set of improvements potentially impact your other metrics, but underlying hardware, operating system, and other changes can also shift the balance between what’s fast and what’s slow. Even what you’re optimizing for might change over time.

With good practices in place you’ll be able to create high-performance AIR applications—and keep them that way. Just don’t let your guard down. All it takes is one slow feature to have users asking, “Is your application fast enough to do X?”