Overview of Storage in AEM 6

One of the most important changes in AEM 6 are the innovations at the repository level.

Currently, there are two node storage implementations available in AEM6: Tar storage, and MongoDB storage.

Tar Storage

Running a freshly installed AEM instance with Tar Storage

By default, AEM 6 uses the Tar storage to store nodes and binaries, using the default configuration options. To manually configured its storage settings, follow the below procedure:

Download the AEM 6 quickstart jar and place it in a new folder.

Unpack AEM by running:

java –jar cq-quickstart-6.jar -unpack

Create a folder named crx-quickstart\install in the installation directory.

Create a file called org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.cfg in the newly created folder.

Edit the file and set the configuration options. The following options are available for Segment Node Store, which is the basis of AEM's Tar storage implementation:

repository.home: Path to repository home under which various repository related data is stored. By default segment files would be stored under the crx-quickstart/segmentstore directory.

tarmk.size: Maximum size of a segment in MB. The default is 256MB.

Start AEM.

Mongo Storage

Running a freshly installed AEM instance with Mongo Storage

AEM 6 can be configured to run with MongoDB storage by following the below procedure:

Download the AEM 6 quickstart jar and place it into a new folder.

Unpack AEM by running the following command:

java –jar cq-quickstart-6.jar -unpack

Make sure that MongoDB is installed and an instance of mongod is running. For more info, see Installing MongoDB.

Create a folder named crx-quickstart\install in the installation directory.

Configure the node store by creating a configuration file with the name of the configuration you want to use in the crx-quickstart\install directory.

The Document Node Store (which is the basis for AEM's MongoDB storage implementation) uses a file called org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.cfg

Edit the file and set your configuration options. The following options are available:

mongouri: The MongoURI required to connect to Mongo Database. The default is mongodb://localhost:27017

db: Name of the Mongo database. By default new AEM 6 installations use aem-author as the database name.

cache: The cache size in MB. This is distributed among various caches used in DocumentNodeStore. The default is 256

changesSize: Size in MB of capped collection used in Mongo for caching the diff output. The default is 256

customBlobStore: Boolean value indicating that a custom data store will be used. The default is false.

Create a configuration file with the PID of the data store you wish to use and edit the file in order to set the configuration options. For more info, please see Configuring Node Stores and Data Stores.

Start the AEM 6 jar with a MongoDB storage backend by running:

java -jar cq-quickstart-6.jar -r crx3,crx3mongo

Where -r is the backend runmode. In this example, it will start with MongoDB support.

Maintaining the Repository

As data is never overwritten in a tar file, the disk usage increases even when only updating existing data. To make up for the growing size of the repository, AEM employs a garbage collection mechanism called Revision Cleanup. The mechanism will reclaim disk space by removing obsolete data from the repository, and has three phases: estimation, compaction, cleanup. In the past the revision cleanup was often referenced as compaction.

Offline revision cleanup is the recommended and supported way of performing revision cleanup.

Choosing the Type of Revision Cleanup

For AEM 6.2 Publish instances

Offline revision cleanup is the recommended way of cleaning up revisions. This requires to shut down the instances in order to run offline revision cleanup during non business hours.

If downtimes are not possible, customers can contact Adobe Support to evaluate additional options:

If there is more than one publish instance, one can be taken down for offline revision cleanup while avoiding replication from author. After a successful revision cleanup, the instance can be taken back into production while a clone of the clean instance would replace other remaining production ones.

If the above is still not possible, online revision cleanup can be used under the terms and conditions of the program. This type of cleanup has restricted support in AEM 6.2.

For AEM 6.2 Author instances

Offline revision cleanup is the recommended way of cleanup for author instances as well. However, in rare cases where downtime is not possible either beacause maintenance windows were not foreseen and can have the same business impact as system outages, customers should contact Adobe Support to evaluate additional options. The additional options for performing cleanup on author instances are the same as the ones described above for publish instances.

The tool is a runnable jar that can be manually run to compact the repository. The process is called offline revision cleanup because the repository needs to be shut down in order to properly run the tool. Make sure to plan the cleanup in accordance with your maintenance window.

Increasing the Performance of Offline Revision Cleanup

Since version 1.0.22, the oak-run tool introduces several features with an aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible.

The list includes several command line parameters, as described below:

-Dtar.memoryMapped. Use this to enable memory mapped operations for tar file to greatly increase performance. You can set this as true or false. It is highly recommended you enable this feature in order to speed up compaction.

-Dupdate.limit. Defines the threshold for the flush of a temporary transaction to disk. The default value is 5000000.

-Dcompress-interval. Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available.

-Dcompaction-progress-log. The number of compacted nodes that will be logged. The default value is 1500000,which means that the first 1500000 compacted nodes will be logged during the operation. Use this in conjunction with the next parameter documented below.

-Dlogback.configurationFile. Use a configuration file for logging. You can use the below configuration file to enable the logging of the nodes that are being compacted:

-Dtar.PersistCompactionMap.Set this parameter to true to use disk space instead of heap memory for compaction map persistance. Requires the oak-run tool versions 1.4 and higher. For further details also see question 6 in the FAQ section.

Attenzione:

Memory mapped file operations do not work correctly on some versions of Windows. Make sure that you use the tool without the -Dtar.memoryMapped parameter on Windows platforms, otherwise the revision cleanup will fail.

Revision Cleanup Frequently Asked Questions

It depends on the repository growth rate. As a general rule of thumb, for average content repositories, it is recommended that you perform revision cleanup every 2 weeks for an author instance, and once per quarter for a publish instance.

3. What are the factors that determine the duration of the Offline Revision Cleanup?

The repository size and the amount of revisions that need to be cleaned up determines the duration of the cleanup.

4. What's the worst that can happen if you do not perform revision cleanup?

The AEM instance will run out of disk space, which will cause outages in production. It is highly recommended that you follow the monitoring best practices as mentioned in Maintenance and Monitoring.

5. What is the difference between a revision and a page version?

Oak revision: Oak organizes all the content in a large tree hierarchy that consists of nodes and properties. Each snapshot or revision of this content tree is immutable, and changes to the tree are expressed as a sequence of new revisions. Typically, each content modification triggers a new revision. See also http://jackrabbit.apache.org/dev/ngp.html.

Page Version: Versioning creates a "snapshot" of a page at a specific point in time. Typically, a new version is created when a page is activated. For more information, see Working with Page Versions.

6. How to speed up the Offline Revision Cleanup task if it does not complete within 8 hours ?