Adobe has conducted a series of performance and capacity tests to benchmark the behavior of CQ DAM in a variety of common configurations. These are intented to help readers estimate the capacity and scalability of proposed systems.

Setup Details

Hardware Configuration

All of the reported benchmarks are run on a set of identical back-office server type systems. The servers are Hewlett Packard model DL360-G7, configured as follows:

Operating

RHEL 5.5

CPU

Intel(R) Xeon(R) CPU E5649 @ 2.53 GHz

Memory

32 GB

Disk controller

SAS RAID controller

Disk

4 x 146 GB 15,000 RPM SAS

Java version

Java 1.6.0_26 (64 Bit)

Local Storage

This system has a very high performance disk subsystem for local storage in which small, very fast disks are organized in a RAID0 stripe set. The disk configuration is designed to mimic the performance of an 8 volume RAID0 + 1 configuration suitable for production.

Storage Area Network

The bulk storage for DAM is located on a high performance EMC storage area network and network attached storage appliance. The DAM storage is accessed via NFS. The application servers and SAN server are connected to a dedicated storage network via bonded pairs of gigabit Ethernet connections providing 2 GBit/sec effective throughput to the storage server. The EMC storage appliance is dedicated to the application benchmarks when these are running, so no concurrent load from other applications can occur.

Topology

Following figure shows the network configuration used for performance benchmarks:

Software Configurations

All of the software configurations were default/standard, except the following settings that were done specifically for the benchmarks.

Custom Configurations

Description

Memory arguments

Xms512m - Xmx4096m - XX : MaxPermSize = 512m

We require high memory arguments for our scenarios, otherwise server might run out of memory.

Number of workflow threads

6

The default number of workflow threads is equal to number of CPUs recognized by the operating system and would equal to 24 in this case, since the server has 12 cores and 12 additional hyperthreads. It is advised that this number is changed, according to the CPU utilization noticed on the server and preference between foreground processes and background processes. Since if each of the available processor is processing workflows, then workflow processing throughput would increase but foreground processing might be slowed down. The setting used here allows up to 6 threads, which, ignoring hyperthreads, represents about half of the CPUs available.

Tar PM optimization

Off

Automatic execution of TAR PM which runs at scheduled time of 2AM - 5AM was turned off during the benchmark runs to prevent any effect on benchmark execution.

FFmpeg Configurations

DAM requires FFmpeg to be installed on the server for processing of video assets. The version used in the reported benchmarks was version 0.10.2.

Custom Configurations

Description

Codecs

Audio codec changed to alac Audio and Video codecs need to be changed according to the FFmpeg build installed on the server.

Encoding

1-pass

The benchmarks presented here were done using the non-default 1-pass setting owing to a functional problem encountered with the default 2-pass setting.

Number of threads spawned by FFmpeg process

2

Default setting resulted in CPU utilization shooting up to 100%, so the number of threads spawned by FFmpeg had to be restrained to 2.

Data Volumes

All of the DAM performance benchmarks are executed with a consistent initial quantity of data, and using data of a consistent composition and folder structure.

Distribution

On the first level, there are two folders each containing 10 folders on the second level. Each of the 10 folders contains 300 DAM assets on the third level. Each folder at the third level contains 75% JPG files, 20% TIFF files and 5% MP4 files. One extra folder will be created in addition to the base load, which will contain 5 files of each type(JPG, TIFF and MP4). This folder is used for benchmarking View operations. This folder is created at second level, thus the first folder at second level contains 11 folders.

Type of File

Size

Count

Percentage of assets by count

Percentage of assets by size

JPG

2 Mb

4,500

75%

16.3%

TIFF

28.7 Mb

1,200

20%

59.4%

MP4

47 Mb

300

5%

24.3%

Uniqueness

DAM has the ability to detect duplication of assets, for example the same image stored under multiple names, and it optimizes the storage by avoiding multiple copies of the same object. For the purposes of performance benchmarking, however, it is necessary to have a fixed set of assets, so that benchmarks are repeatable and consistent from run to run. In order to defeat the detection of duplicate assets in performance benchmarks, a strategy is needed to make the assets unique from other instances, to ensure that the operation of DAM is representative of real world operations with unique assets.

A sample asset is taken and its title and artist metadata is modified to create unique copies of the same asset. Title is unique for each of the assets. Artist property is chosen randomly from a set of 250 words. This property is used for Full Text Searching in search benchmark case.

Image files are altered using EXIF tool. For videos, a custom java code is used to replace some metadata tags (both XMP and non XMP) with random dictionary words.

Tags

Ten tags are applied to each asset. These tags are chosen from the subset of predefined tags that are shipped with default CQ installation. Nine of these tags are chosen randomly while one tag is fixed. This is done to segregate the 6,000 assets uploaded as load from any assets uploaded to the repository thereafter, so that search results don’t get affected by the newly uploaded benchmark assets. Subset of tags applied to load is mutually exclusive to the subset of tags applied to assets as part of benchmarks.

Initial System State

Several of the benchmarks involve the addition of assets to DAM. In each benchmark case, the following initial conditions are established as a baseline.

A total of 6,000 unique DAM assets with the composition defined above are present.

Workflow processing is completed for all the uploaded assets.

Datastore garbage collection done.

Indices are merged and TAR PM optimization is complete. Please note that even when automatic scheduling of TAR PM is turned off, manually initiated TAR PM optimizations are performed when required.

System cache is cleaned.

Batch Uploading and Tagging Scenario

This benchmark scenario is designed to simulate a typical bulk load of DAM, where large numbers of assets must be inserted in rapid succession. Other application workloads that consist of large numbers of asset insertions are also simulated. In this workload each thread uploads an asset and applies 10 tags to it.

Implementation Details

A total of 3,000 assets containing a mix of images and videos are uploaded into DAM. They are uploaded into a 3-level structure. On the first level, the count of total folders is equal to the number of threads so that each thread is working on a unique folder. On the third level, there is a maximum limit of 300 files. So count of folders on second level is controlled in part by number of threads.

Amongst the uploaded assets, the ratio of each file type is same as defined above in the Data Volumes section. Tags to be applied are chosen from a subset of tags that are available with default CQ installation.

Definition of Throughput

Whenever an asset is uploaded to DAM, the asynchronous workflow operations are triggered to run in the background to create asset renditions and extract metadata from the asset. Because workflow execution is asynchronous to the initial upload operation and typically takes longer to complete, the background execution of these workflows must be monitored in order to assess the system throughput.

This benchmark is designed to measure the throughput in transactions per minute. Two different throughput measurements are made:

Synchronous operation throughput, where time measured is time taken for upload and tagging to complete.

Overall throughput, where the time taken to complete asynchronous workflow execution is also considered.

The scenario workload consists of a sequence of transactions, each having the following steps:

Upload 15 JPG files and tag them (75%)

Upload 4 TIFF files and tag them (20%)

Upload 1 MP4 file and tag it (5%)

The mixture of assets follows the same size and type percentages as outlines in the Data Volumes section. Each transaction uploads 20 files, and the resulting throughput can be expressed in either transactions per minute or files per minute.

The benchmark scenario is considered completed when:

The synchronous upload and tagging operations are finished.

No more asynchronous workflows are waiting to be processed in Sling Events.

Finding Thread Count for Maximum Throughput

Throughput in this benchmark varies with the number of concurrent threads making synchronous asset upload and tagging requests. The benchmark was repeated with increasing thread counts in order to find the point where maximum throughput was obtained.

This graph shows that synchronous operation throughput increases up to 8 threads and then becomes fairly constant. Maximum throughput was about 8.9 transactions per minute, with 40 request threads. Overall throughput remains close to 0.8 transactions per minute at all concurrency levels. Please note that the upload of assets finishes within about 20 minutes, whereas it takes close to three hours for workflows to complete.

Following graph shows growth and eventual completion of the queue of asynchronous workflows over time:

It is visible from the graph, with increase in number of threads, that the time taken for queue to become empty increases by a marginal amount. The two vertical marker lines denote time when synchronous transaction submission phase was completed and when the background workflow processes were completed for the representative run of 30 threads.

The following graph shows workflow event processing statistics for the benchmark.

This graph indicates, as the number of threads is increased, the average waiting time and the average processing time increase in the beginning but become constant later on.

System Resources Trend

The following chart illustrates the observed system resource utilization for each of the benchmark runs.

The peak CPU utilization is approximately 70% for all of the runs. CPU utilization is not a bottleneck. Heap utilization varies, but does not exceed 95% of the configured 4,096 MB heap.

The above chart is plotted for maximum transfer data rates over SAN interface. The maximum data transfer rate to the EMC SAN server as benchmarked using the IOZONE benchmark, is about 40MB/sec. So we can conclude that the EMC SAN and/or the storage network are fully saturated in this scenario once eight or more request threads are used.

Repository Growth

The bulk load scenario is used to measure the growth of the repository storage with the addition of large number of assets.

The datastore increases by about 41 GB with the upload of 3,000 assets. Increase in datastore size is dependent on total files written and hence does not vary with number of threads. The total size of data being uploaded is 28 GB, so the storage requirement for the datastore is about 1.5 times the initial size.

The repository directory (excluding data store) grew by about 15 GB with the upload of 3,000 assets. This shows size of repository grew by about 5 MB/asset. The size of the repository directory does vary slightly with number of threads because the hierarchy in which this data will be uploaded depends on the number of threads, but the variation is under 200 MB in total.

After benchmarks were finished, a TAR PM optimization is performed, to measure the final increase in size of repository. During the optimization the repository size reduced by 14 GB and hence the net increase in size of repository about 1 GB. The TAR PM optimization took 47 minutes to run. This would mean that the net increase in repository size per asset would be 0.3 MB.

Repository Growth Pattern with Upload of Assets

Since growth of repository folder with respect to uploaded data is not linear in nature, a separate exercise was done to study the pattern of repository growth with continuous asset upload. Successively larger sets of assets of 600 to 3,000 items, with same distribution of file types and structure as defined in Data Volumes section, were uploaded to a clean repository. After workflow processing was complete repository size and datastore size was measured and then repository was optimized by TAR PM optimization. The repository size was then measured to determine the net increase after optimization.

Following graph shows the behavior for increase in size of repository and datastore. Please note that both the axes are logarithmic.

This chart illustrates some important aspects of filesystem usage by a large scale DAM implementation:

All storage requirements are proportional to the amount of content present in DAM.

Most of the storage used is in the data store part, which contains anything above a certain size threshold. TAR PM optimization does not affect this storage.

Repository storage grows very rapidly while loading (and via workflows, rendering) content, but a TAR-PM optimization will recover most of this space.

One critical issue to keep in mind, when sizing the storage requirements of the repository, is to understand not just the steady-state requirements, but to accommodate all of the storage needed between TAR PM optimizations. Normally these are scheduled for once every 24 hours.

The graph shows that the growth in repository size and slope of curve would depend on the count as well as the type of assets uploaded to the repository. In addition, the design of the DAM application would affect what renditions are created for assets. The scenario reported in this document may not be representative of all situations.

Read Only Scenario

This scenario is designed to simulate a read-only usage pattern for DAM, consisting of search and retrieval operations.

Implementation Details

Each thread in this scenario will perform three view and seven search operations.

View

The baseline data for the DAM benchmarks includes a set of 15 files that are used for view operations. The content is described in the Data Volumes section.

View operations are performed on the 15 files (5 of each type) present in the view folder and each operation chooses a file randomly when executing view benchmark case.

The mixture of files retrieved in the view operation is one third JPG, one third TIFF and one third MP4.

Search

The following 3 kind of queries are included in the benchmark:

TYPE 1: Full Text Search AND search on Tags.

TYPE 2: Search on the basis of tags.

TYPE 3: Full Text Search AND search on Tags AND search on File Type.

Five different queries of each type are recorded and the scenario selects randomly from amongst these for each transaction.

The search criteria are constructed to ensure that a balance of searches that find nothing, that find a small number of results and that return many matches. All search queries contain a AND clause containing AND with unique tag applied to the initial baseline of 6,000 assets. This is done so that results of search queries remain the same as additional assets are uploaded as part of the bulk loading benchmarks.

Definition of Throughput

The unit of work for this benchmark is the following scenario that performs a representative mixture of operations. The throughput is measured in scenarios completed per second. It is also possible to express throughput in searches per second or views per second, based on this mixture of operations.

Transaction Definition

Description

1

View TIFF file

2

Search operation of TYPE1

3

Search operation of TYPE1

4

View JPG

5

Search operation of TYPE2

6

Search operation of TYPE2

7

View MP4 file

8

Search operation of TYPE3

9

Search operation of TYPE3

10

Search operation randomly chosen from the three types

This view and search scenario does not contain any asynchronous or background operations, so the throughput can be measured simply based on the time taken to perform the synchronous workload.

Finding Thread Count for Maximum Throughput

Throughput in this benchmark varies with the number of concurrent threads performing view and read scenarios. The benchmark was repeated with increasing thread counts in order to find the point where maximum throughput was obtained.

This graph shows that the maximum throughput was seen at around 40 threads, although throughput is roughly constant above 30 threads. The maximum is a throughput of 1.4 transactions per second. Each transaction includes seven searches and three view operations.

The throughput capability of the system was:

5,040 transactions per hour

15,100 views per hour

35,300 searches per hour

System Resources Trend

Below graphs shows average CPU usage, heap usage and average data transfer rates over SAN interface during the benchmark runs.

These graphs indicate that none of the critical server resources were heavily utilized. SAN utilization is low due to the fact that the small set of 15 files are used for view operations, principally served through the system cache.

These graphs show that pure read operations do not draw extensively on system resources. The ultimate throughput limit in this benchmark was the network bandwidth available on the load generation network. At a rate of 1.4 scenarios per second content is returned by the server at a rate of about 100 MB per second, saturating the available 1 GBit Ethernet link.

By reconfiguring the network to present load via both the normal load generation and storage networks, throughput increased from 1.35 TPS to 2.2 TPS. The throughput capability of the system using this configuration was therefore:

7,900 transactions/hour

23,800 views per hour

55,400 searches per hour

Mixed Usage Scenario

The purpose of this scenario is to combine the read and write operations of the other scenarios into a balanced one, more representative of a day to day usage pattern for DAM.

Implementation Details

Each thread in this scenario will perform six searches, three views and one upload operation. The view and search portions of the scenario are similar to the sequences described for the Read Only scenario above.

View

The baseline data for the DAM benchmarks includes a set of 15 files that are used for view operations. The content is described in the Data Volumes section. The mixture of files retrieved in the view operation is one third JPG, one third TIFF and one third MP4.

Search

As with the Read Only scenario, three types of queries and five of each type are selected randomly to measure search performance. Unlike the Read Only scenario, only six of the scenario steps are searches.

Upload

Each transaction has one asset upload, step in which an asset is uploaded and tagged. The operation performed is similar to that described for assets uploaded in the Batch Uploading and Tagging scenario above.

Assets are selected for upload following ratio of different file types described in the Data Volumes section. JPG types are uploaded in 75% transactions, TIFF in 20% and MP4 in 5%.

Distribution of uploaded assets within the folder structure is different as compared with the Batch Uploading and Tagging scenario. Rather than using thread level folders the benchmark folder directly contains folders, each of which will be the target for 300 assets. The following diagram depicts the assets distribution structure for this mixed scenario:

Definition of Throughput

The unit of work for this benchmark is the following scenario that performs a representative mixture of operations. Like Batch Upload activity, two types of throughput are measured:

Synchronous operation throughput, where time measured is taken for upload and tagging to complete.

Overall throughput, where the time taken to complete asynchronous workflow execution is also considered.

Transaction Definition

Description

1

Upload asset

2

View TIFF file

3

Search operation of TYPE1

4

Search operation of TYPE1

5

View JPG

6

Search operation of TYPE2

7

Search operation of TYPE2

8

View MP4 file

9

Search operation of TYPE3

10

Search operation of TYPE3

The benchmark scenario is considered completed when:

The synchronous upload and tagging operations are finished.

No more asynchronous workflows are waiting to be processed in Sling Events.

Finding Thread Count for Maximum Throughput

Throughput in this benchmark varies with the number of concurrent threads making requests to the server. The benchmark was repeated with increasing thread counts in order to find the point where maximum throughput was obtained.

This graph shows that synchronous operation throughput increases up to about 40 threads and then become fairly constant. Maximum throughput was about 1.2 transactions per second. Workflow completion throughput remains close to 0.3 transactions/second.

Maximum synchronous operation throughput was:

4,360 transactions per hour

13,100 views per hour

26,200 searches per hour

Maximum overall throughput was:

1,010 transactions per hour, or 1,010 assets per hour.

Despite the presence in this scenario of a large amount of read and search operations, the overall capacity of the server to accept and generate renditions of new assets is about the same, at just over 1,000 per hour, as the rate seen in the Batch Upload scenario.

Following graph shows growth of overall queue over time:

Since workflow completion throughput remains fairly constant and the number of transactions also remains almost constant, the total time taken in each run is same. The above graph shows trend of queue growth with varying number of threads. The two vertical marker lines denote time when benchmarks got finished and when workflow process got completed for the representative run of 30 threads.

The following graph shows workflow event processing statistics for the benchmark:

Above graph indicates average waiting time and average processing time increases in the beginning, but becomes constant later on, as the number of threads is increased. The average processing time is the same as seen in the batch uploading scenario, indicating that this time taken to generate asset renderings is about the same. The average waiting time in the queue however is less, due to the fact that the overall number of assets waiting to be processed in this scenario is much less than in the batch upload.

System Resources Trend

The following chart illustrates the observed system resource utilization for each of the benchmark runs.

Above graph shows average CPU and Heap Utilization during benchmarks runs for different threads. Heap utilization climbs slightly at thread counts up to 30, but remains below 55% of the 4,096 MB heap for all of the runs. CPU utilization is approximately 45% at all load levels.

Above graph shows average data transfer rates over SAN interface during benchmarks runs for different threads. Overall SAN traffic is much higher than seen with the Read Only scenario because the insertion of assets and processing of workflow tasks makes more use of the repository storage.

Scaling Data Volume

The purpose of this scenario is to measure how the performance of DAM changes when additional quantities of content are present.

Implementation Details

In this scenario, the baseline data load varies from 6,000 assets to 24,000 assets and the Mixed Usage scenario transaction load is used to measure the effect in throughput obtained, as the number of assets is increased.

The data is arranged to that search queries in the benchmark case return the same result set, independently of the volume of assets. This was achieved by tagging every asset in the first 6,000 load with a common tag and then ANDing each search query with this tag. This tag was not applied to any of the subsequently loaded assets.

Results

Below graphs show variation of throughput with increase in size of repository. The graphs indicate that there is only a slight drop if throughput with 6,000 and 12,000 are compared but the drop becomes noticeable in case of 24,000. The drop increases as the number of concurrent threads increases.

Variation in throughput is between 0%-7% for 6K and 12K. Variation in throughput is between 5%-20% for 12K and 24K.

The overall throughput remains about the same for all repository volumes. The throughput is approximately 0.28 assets per second, or about 1,010 assets per hour.

Twitter™ and Facebook posts are not covered under the terms of Creative Commons.