Tar Persistence Manager Configuration

Question / Problem

Answer / Resolution

Tar Persistence Manager

You can store CRX content in a Persistence Manager. By default, when you install CRX 1.4, the persistence manager saves the repository content to tar files.

Purpose

The Tar Persistence Manager (Tar PM) is a disk-based persistence manager that uses the tar file format for storing content and is useful in situations where high performance of creating and modifying data is required (as the Tar format is append-only, so writes are very efficient).

Tar PM clustering lets CRX compensate for hardware and software failures by eliminating a single point of failure (one server) while using applications without changes. Unlike clustering storage based on a database server (also available in CRX), Tar PM clustering is based on networked file systems.

Tar PM versus a RDBMS-based PM

Both Tar PM and database-based PMs support transactions, any file system, and optimization at runtime or in batch mode. Although Tar PM is a new technology, it does have the following advantages over using an database based persistence manager:

Tar files are append-only.

Tar files can be backed up easily online.

Tar is a standard file format, accessible via known tools, such as tar, WinZip, and so on.

Tar is a platform-independent format.

Low cost of ownership and license.

Tar PM is specifically designed for JCR repositories.

Tar PM is faster than RDBMS-based persistence managers for the JCR use case.

The Tar PM takes advantage of the very simple key-value pair data structure of CRX.

(optional, default is 256) If the current data file grows larger than this number (in megabytes), a new data file is greated (that means, if the last entry in a file is very big, a data file can actually be much bigger, as entries are not split among files). The maximum file size is 1024 (1 GB). Data files are kept open at runtime. Depending on the amount of data is stored in the Tar PM, this value needs to be increased or the limit of open files per process needs to be adjusted. If this value is changed when tar files already exist, new tar files will grow up to this size (existing files are not changed).

autoOptimizeAt

(optional, default is 2:00-5:00) Automatically optimize at the given time. When the optimization should be run. Example: 2:00 to automatically optimize every morning at two. The index files will be merged as well if required. To disable the automatic optimization, set the value to "-0" (which actually means 'stop optimization at midnight').

bindAddress

If the synchronization between cluster nodes should be done over a specific network interface. By default all network interfaces are used. Default: empty (use all interfaces).

portList

The list of ports to use in master mode. By default any free port is used. When using a firewall, open ports must be listed. One port per workspace is required. A list of ports or ranges is supported, for example: 9100-9110 or 9100-9110,9210-9220. Default: 0 (any port).

preferredMaster

Only applicable in a clustering environment. If enabled, this cluster node will try to become the master even if another cluster node was started before. Default: false (not enabled)

lockClass

The name of the class to use for locking. Supported are com.day.crx.util.NativeFileLock and com.day.crx.util.CooperativeFileLock. When using a file system that does not support file locking (for example some older versions of NFS), the cooperative locking class should be used. Default: com.day.crx.util.NativeFileLock

lockTimeout

When clustering is used, the maximum time (in milliseconds) to wait to lock the shared files. Default: 0 for no limit.

fileMode

The file mode how to open the data files. Options are "rw" (read-write), "r" (read-only), "rwd" (read-write, content is written synchronously), and "rws" (read-write, content and metadata changes are written synchronously). Optionally a + can be appended to call fsync after writing (however this will slow down writes a lot). Default: "rw" for read-write.

optimizeSleep

(optional, default is 1) The number of milliseconds to wait after optimizing a transaction. Floating point precision is supported.

maxIndexBuffer

(optional, default is 32) After an abnormal termination, at most this much data (in megabytes) needs to be scanned in order to re-create the tar entry index.

Configuration values are read when the repository is started; that means you may want to re-start the repository after changing the configuration.Note: If you change the configuration after a workspace has already been created, you need to change both the repository.xml and workspace.xml files.

Optimizing Tar Files

Consistency Checking and Fixing

The Tar PM can check repository consistency and fix consistency problems at startup.To enable consistency checking and automatically fix problems, set the following options in the repository.xml and workspace.xml, and re-start CRX:

In order to fix consistency problems, the consistency check setting must be enabledas well.

After the consistency check finished, disable the relevant settings, otherwisethe consistency check will always run when starting up CRX.

If Tar Files Get Big

If some data*.tar file are very large, the there are large transactions. Large transactions are a problem (in any case - not only for the Tar PM but also for the main memory and other sub-systems). You can analyze what is in a data*.tar file using the jsp file in the attachment.

Migration from Regular to Clustered Environment

The easiest way to migrate to a clustered environment is to export the data, change the configuration, and then import the data.

Migration from Clustered to Regular Environment

The easiest way to migrate to a regular environment is to export the data, change the configuration, and then import the data.