The Web Experience Management (WEM) Robustness Team at Adobe has conducted a series of performance and capacity tests to benchmark the behavior of AEM in a variety of common configurations.

The intention is that these results will enable readers to estimate the capacity and scalability of proposed systems.

In addition, this document provides guidance on both:

How to run independent performance profiling tests.

How to interpret results and validate existing systems.

User scalability results for a limited set of author and publish scenarios are also provided to assist in determining system resource utilization and maximum throughput of an application.

Test Scenarios - Author

Authoring scenarios are ones that concentrate on the performance of AEM as an authoring platform, without regard to end-user accesses to published AEM content.

They examine:

The performance of update operations.

The scalability of page creation, editing and activation operations.

Site Creation Scenario

Each thread creates a AEM Page. Page content is generated by adding an image and a SWF video. Subsequently, the page is activated so that it is replicated to the publish instance. After replication, this page is deleted.

This scenario is recorded with all possible GET and POST requests that a user generates while performing these operations.

Remarque :

Browser caching is not considered.

Definition of Throughput

In all of the benchmark tests reported in this document, throughput is measured as transactions per hour, with each transaction consisting of a specific, representative set of user actions as detailed below. In order to make comparisons with certain real world scenarios, it can be useful to express throughput in terms of updates per hour. In this case the scenario contains 10 separate update operations and so we can also report a figure for updates per hour.

The representative transaction consists of the following steps:

Login and open the Websites console.

Create a page with SWF.

Add a image/text component.

Add tag cloud.

Add a site map.

Add a comment section.

Add a rating section.

Activate the Page.

Delete the Page.

There will be approximately 10 page updates in a single transaction.

Completion of Scenario

The scenario is marked complete when:

1,000 iterations of all steps have been performed.

The publish queue is empty.

No more jobs are waiting to be processed in sling events.

Scenario Execution

When executing the benchmark test, the objective is to determine the maximum throughput that can be achieved by the system configuration at hand. With configuration variations, this maximum throughput may occur at various load levels. The approach used is to run the benchmark test at gradually increasing load levels, until it is clear that the maximum throughput has been reached. Load is increased on the system by adding additional concurrent request threads, thereby increasing the number of concurrent operations on the server.

A series of benchmark test runs were executed by using a set number of threads at the beginning, then increasing the thread count in each run. On every run, throughput was calculated. Combining the results from multiple runs, we obtained following results:

This graph shows maximum throughput at 14 threads. This throughput is without any sleep/wait time consideration. You can also see a very slight declining trend with increased multi-threaded concurrency.

Scenario Throughput: 1,581 Transactions/Hour

There are 10 updates in each transaction, therefore the updates per hour will be 15,800 Updates/Hour.

The following trends were observed for system resources when scaling the number of threads:

Although there is a slight increase in memory utilization with higher thread counts, the overall heap utilization never exceeds 80% of the 4,096 MB heap. The CPU is not heavily utilized in this benchmark test.

Operation at maximum throughput

The publish queue remained idle ~90% of the time. With 10 threads activating at the same time, the maximum queue length recorded was 9.

The average event processing time for replication events remained at ~136.8ms, whereas average waiting time was ~154.9ms. A total of 8,000 events processed in 40 minutes.

The following graph shows the event processing trend:

The graph indicates that event processing remained constant throughout the run.

Manual Site Browsing

The purpose of the benchmark is to measure how the UI responds with or without load and with or without browser caching. To test this scenario, a series of pages were opened and the load time measured by measuring time from first to last page request. In the scenario, we tested the following pages:

Login Page

Welcome Page

Siteadmin

New Page (Geometrixx content page)

Open blank page (Geometrixx content page)

Open Existing page (Geometrixx content page with a SWF, an image, a sitemap and a tag cloud)

Damadmin

DAM admin

Open a PNG DAM Asset

For the above pages, a total of 4 iterations were run:

Single User First Time (No Server/Client Cache)

Single User Second Time

Multiple Users First Time (No Client Cache)

Multiple Users Second Time

Initial System State

The initial system state consists of:

AEM Author Instance with default data

Browser Cache Clean for “First Time” Test Cases

1 Gb/sec line between server and client

Scenario Execution

The scenario was run in 4 iterations. In the first iteration, we started with a clean server cache, a clean client cache and the recorded load times of the above pages. In the second iteration we again measured the load time for pages, but without cleaning cache:

Because of browser and server-side cache, we can see that in the second iteration the load time is much less (in most cases):

The scenario took 40% less time with browser and server cache.

Another set of iterations were performed, but this time the server was being accessed by 15 threads (close to the number of threads that gave maximum throughput in the Site Creation Scenario). Load time was recorded for both the first and second time. The readings are as follows:

This time, we can see that server cache is already created by a load of 15 users. Also that browser cache contributed to a ~10% improvement between first access and second access.

With this data, we can see how much server cache contributes:

This graph indicates that server cache contributes ~40% to an improvement in site response.

Performance impact of content volume

This test is used to identify how the throughput of AEM author instances trend when we continuously create 50,000 pages. During the test we record system resources including CPU, heap, repository size, amongst others.

The pages are created using the same script as used in the author Site Creation Scenario. This scenario is recorded with all possible GET and POST requests that a user generates while performing these operations.

Remarque :

Browser caching is not considered.

Scenario Execution

In this scenario, we created pages in batches. In between batches we did not restart the AEM author instance. Instead, we invoked Tar PM optimization manually. These pages were created using the script recorded for Site Creation Scenario.

Pages were created in the following batches:

Batch

Number of Pages

Total Pages in Repository

1

1,000

1,000

2

2,000

3,000

3

4,000

7,000

4

8,000

15,000

5

16,000

31,000

6

32,000

63,000

The throughput trend observed is as follows.

The above graph shows that the throughput of the application was consistent even after:

Continuous execution of over 36 hours.

An increase in the number of pages in the repository.

Repository growth can be seen in the following trend:

This graph clearly indicates that:

As more pages are added to the AEM instance.

The faster the repository size grows.

In other words, repository growth is not linear. It depends upon how many nodes you are adding between optimizations. When adding 4,000 pages, the repository will grow to ~45GB. To reduce disk storage requirements it may be necessary invoke Tar PM optimization.

The following graph shows the effects of TAR PM optimization on repository growth. Note that the vertical axis in this graph is logarithmic. Although the workload transactions cause very rapid growth in disk space, the optimization recovers the vast majority of that space. Post-optimize growth in repository space usage is much more modest, remaining at ~2GB even for 8,000 pages.

The time taken to run Tar PM optimization follows a fairly clear trend, directly related to the amount of content in the repository.

The following chart shows the (approximately) linear relation between the time taken for optimization against the total number of pages:

Test Scenarios - Publish

Publish-oriented scenarios are designed to explore the performance of AEM when servicing end-user requests that do not involve authoring activities. Although publish scenarios may involve data updates, they are much less update- or write-intensive than the author scenarios.

Publish Mixed Scenario (Write Heavy)

Each thread in this scenario performs mixed size tasks on the publish environment. The scenario creates a topic, adds two comments and rates five articles. All this is done while browsing through the Geometrixx website.

This scenario is recorded with all possible GET and POST requests that a user generates while performing these operations. Browser caching behavior is not reproduced by this scenario, so each client request acts like an un-cached initial request.

Definition of Throughput

Throughput in this scenario is defined as transactions per hour. Each transaction has the following steps

Login to Publish instance.

Create a topic with Subject and description.

Add five comments to that topic.

Navigate to the Products → Circle page.

Add ten ratings on this page.

There will be 16 updates occurring per transaction.

Completion of Scenario

The scenario is marked completed when all steps have been performed for the specified number of iterations.

Scenario Execution

This benchmark scenario was executed with a single publish server. A series of benchmark test runs were executed, each using a set number of threads, then increasing the thread count in each run. On every run, throughput was calculated.

Combining the results from multiple runs, we obtained following results:

This graph shows that we obtained the maximum throughput (of just over 600 scenario transactions per hour) with ~16 threads.

Scenario Throughput: 619 Transactions/Hour

There are 16 updates in each transaction, hence updates per hour equate to 9,900 Updates/Hour.

Although there was some variation in throughput at different load levels, the server was able to deliver above 450 transactions per hour (or ~7,200 updates per hour) at all load levels that were attempted.

The following trend of system resources was seen when scaling the number of threads:

The heap utilization varies a small amount with thread concurrency, but in the main, is just over 90% of the 4,096 MB heap when heavily loaded. CPU utilization is slightly higher with this publish workload than that seen with pure authoring.

Operation at maximum throughput

Disk throughput remained at ~50 transactions per second, which was far below the maximum capability of the disks on this server.

Site Browsing Scenario (Read Heavy)

Each thread in this scenario will browse the Geometrixx website. Each thread will browse two sets of pages 5 times each and put a comment on an article.

This scenario is recorded in two manners. One with all possible GET and POST requests that a user generates while performing these operations. In the second recording we permit the browser cache to operate and only record the page loads that a normally operating browser would generate upon a second visit to a web site. The script for the second case is far smaller than the first case, because a major proportion of the content is obtained from browser cache.

Definition of Throughput

Throughput in this scenario is defined as transactions per hour. Each transaction has the following steps:

Login to the publish instance.

Browse over a first set of 5 pages, performing 5 page loads on each one; a total of 25 page loads.

Browse a second set of 5 pages, performing 5 page loads on each one; 25 more page loads.

Comment on an article.

Altogether, this scenario performs 50 pages loads, each one either cached or non-cached according to the variation. A single update is performed in each iteration.

Completion of Scenario

The scenario was marked completed when all steps had been performed for the specified number of iterations.

Scenario Execution - No Browser Cache

In this execution we used a script in which we recorded all possible GET and POST requests.

Approximately 18 runs were executed. We started with 8 threads and increased the thread count in each run. Throughput was calculated on every run; after 18 runs, we obtained the following readings:

This graph shows that we got maximum throughput at ~30 threads. Each transaction includes browsing two sets of 5 pages and adding a comment. This throughput is without any sleep/wait time.

Scenario Throughput: 507.5 Transactions/Hour

Page View Throughput: ~24,400 Page Views/Hour

The following trend of system resources was seen when scaling the number of threads:

Scenario Execution - With Browser Cache

In this execution, we used a script in which we recorded all possible GET and POST requests.

A total of 9 runs were executed, starting with 8 threads and increasing the thread count in each run. Throughput was calculated on every run. After 9 runs, we obtained the following readings:

This graph shows that we got maximum throughput at ~17 threads. Each transaction includes 50 page views and adding a comment. This throughput is without any sleep/wait time.

Scenario Throughput: 1,848 Transactions/Hour

Page View Throughput: ~92,400 Page Views/Hour

The following trend of system resources was seen when scaling the number of threads

Effects of browser cache

If we assume that users are returning to the site, or browsing pages multiple times, then the browser cache will serve a portion of the page loads.

These tests show that read-heavy scenarios can give 3.6 times more throughput than scenarios where we do not consider the browser cache and replay all static requests.

Allowing for browser cache operations we obtain above 92,000 page views per hour. This is far more than the 24,000 page views per hour seen when we exclude the effects of the browser cache.

Test Scenarios - Mixed

These scenarios combine the author and publish scenario types to explore the performance seen when authoring and activation are occurring at the same time as end-user browsing.

Increasing Publish Instance count

The purpose of this benchmark test is to look for scalability issues that occur when the author node is required to activate and replicate updates to more than one publish instance. The author Site Modification Scenario is replayed here for an increasing number of publish instances. This is to investigate how throughput on author, and replication processing, perform when we increase the number of publish instances. The test was run for 1, 2 and 3 publish instances.

The author Site Modification Scenario is a mixed scenario, where one thread creates 2 pages and modifies 9 pages. Each iteration has a total of 11 activations.

Here we can see that the throughput trends of all three scenarios are quite similar to one another. The peak throughput of just over 300 scenarios per hour is virtually the same. Under increased load with additional threads, the three-node configuration showed slightly less decline; however throughput exceeded 200 scenarios per hour in all cases.

The following graph shows the impact of workflow processing:

Here we can see similar trends in the average processing time for all runs. The background processing associated with page activation follows virtually the same pattern irrespective of whether one, two or three publish nodes are present.

There is a very slight increase in the trend for average processing time when there is an increase in the number of publish instances. In this chart you can see that the time increases from 135ms with one node, to just above 140ms with three nodes.

Benchmark Environment

Hardware Environment

All of the reported benchmarks are run on a set of identical back-office server-type systems. The servers are:

Hewlett Packard model DL360-G7

Configured as follows:

Operating System

RHEL 5.5

CPU

Intel(R) Xeon(R) CPU E5649 @ 2.53GHz

Memory

32 GB

Disk Controller

SAS RAID controller

Disk

4 × 146GB 15,000 RPM SAS

JVM Arguments

-Xms512m -Xmx4096m -XX:MaxPermSize=256m

This system has a very high-performance disk subsystem, in which small, very fast disks are organized in a RAID0 stripe set. There is no redundancy and maximum performance is obtained. Arguably, this disk configuration is not what a production environment would use, but our RAID0 configuration is used to mimic the performance of a much more expensive 8-volume RAID0+1 configuration that might well be used in production, but at much lower cost to set up.

Initial System State

Some aspects of system performance may depend, in part, on the amount of content present in the AEM repository. The benchmark tests reported here are performed with significant baseline content, as follows:

Content Base Load - 50,000 Pages

Repository Size - 28.91 GB

Tar PM Optimization - done

Indexes merge - done.

System Cache - cleaned

Topology

Methodology

Step Test

Each benchmark test is performed as a step-test in which incremental load levels are placed on the server to discover the point at which maximum throughput is obtained.

A performance test-harness orchestrates the thread-scaling exercise. First we define all the thread counts we need to test for each core count. We measure throughput for each of the thread counts and use thread count with maximum throughput for comparisons.

For the author exercise, we used following inputs:

Core

Client thread counts tested to find maximum throughput

8

4,5,6,7,8,9,10,11,12,13,14

16

8,9,10,11,12,13,14,16,18

24

8,10,11,12,13,14,16,20

Between every thread count, we refresh the whole server and restore from backup. The performance harness is also responsible for measuring system resources for every run.

During the run, the throughput, disk utilization, heap utilization and CPU utilization are monitored using JMX, iostat and other related tools. The actual scenario data for a sample run (author scenario on 24 CPUs) is shown below.

The critical metrics are determined for the overall run. Here, the mean CPU utilization is just under 20% and the mean disk utilization is just under 2.5%. The tables in the main section are constructed from data collected in this way, for each different CPU count and scenario type.

Definition of Update

An update is defined as any user action that results in either:

The creation of a new page or DAM asset.

The modification of an existing page or DAM asset.

Updates per hour values (as mentioned in the above tests) correspond to the number of pages created or updated within an hour.