Once you have Liferay installed in more than one node on your application server, there are several optimizations that need to be made. At a minimum, Liferay should be configured in the following way for a clustered environment:

All nodes should be pointing to the same Liferay database

Jackrabbit, the JSR-170 content repository, should be:

On a shared file system available to all the nodes (not really recommended, though), or

In a database that is shared by all the nodes

Alternatively, the Document Library should be configured to use the File System Hook, and the files can be stored on a SAN for better performance.

Similarly, Lucene, the full text search indexer, should be:

On a shared file system available to all the nodes (not really recommended, though), or

If you have not configured your application server to use farms for deployment, the hot deploy folder should be a separate folder for all the nodes, and plugins will have to be deployed to all of the nodes individually. This can be done via a script.

Many of these configuration changes can be made by adding or modifying properties in your portal-ext.properties file. Remember that this file overrides the defaults that are in the portal.properties file. The original version of this file can be found in the Liferay source code or can be extracted from the portal-impl.jar file in your Liferay installation. It is a best practice to copy the relevant section that you want to modify from portal.properties into your portal-ext.properties file, and then modify the values there.

Note: This chapter documents a Liferay-specific cluster configuration, without getting into specific implementations of third party software, such as Java EE application servers, HTTP servers, and load balancers. Please consult your documentation for those components of your cluster for specific details of those components. Before configuring Liferay in a cluster configuration, make sure your OS is not defining the hostname of your box to the local network at 127.0.0.1.

All Nodes Should Be Pointing to the Same Liferay Database

This is pretty self-explanatory. Each node should be configured with a data source that points to one Liferay database (or a database cluster) that all of the nodes will share. This ensures that all of the nodes operate from the same basic data set. This means, of course, that Liferay cannot (and should not) use the embedded HSQL database that is shipped with the bundles. It is also best if the database server is a separate physical box from the server which is running Liferay.

Document Library Configuration

There are several options available for configuring how Liferay's document library stores files. Each option is a hook which can be configured through the portal-ext.properties file by setting the dl.hook.impl= property.

Default File System Hook

This is the hook that Liferay will use to manage your document library by default. It uses the file system to store documents which has proven to be the highest performing configuration for large document libraries. You can use the file system for your clustered configuration, but the advanced file system hook is generally recommended for more complex clustering environments.

You can configure the path where your documents are stored by setting the dl.hook.file.system.root.dir= property in your portal-ext.properties.

Jackrabbit Sharing

Liferay uses Jackrabbit—which is a project from Apache—as its JSR-170 compliant document repository. By default, Jackrabbit is configured to store the documents on the local file system upon which Liferay is installed, in the $LIFERAY_HOME/liferay/jackrabbit folder. Inside this folder is Jackrabbit's configuration file, called repository.xml.

To simply move the default repository location to a shared folder, you do not need to edit Jackrabbit's configuration file. Instead, find the section in portal.properties labeled JCR and copy/paste that section into your portal-ext.properties file. One of the properties, by default, is the following:

jcr.jackrabbit.repository.root=${liferay.home}/data/jackrabbit

Change this property to point to a shared folder that all of the nodes can see. A new Jackrabbit configuration file will be generated in that location.

Note that because of file locking issues, this is not the best way to share Jackrabbit resources. If you have two people logged in at the same time uploading content, you could encounter data corruption using this method, and because of this, we do not recommend it for a production system. Instead, to enable better data protection, you should redirect Jackrabbit into your database of choice. You can use the Liferay database or another database for this purpose. This will require editing Jackrabbit's configuration file.

The default Jackrabbit configuration file has sections commented out for moving the Jackrabbit configuration into the database. This has been done to make it as easy as possible to enable this configuration. To move the Jackrabbit configuration into the database, simply comment out the sections relating to the file system and comment in the sections relating to the database. These by default are configured for a MySQL database. If you are using another database, you will likely need to modify the configuration, as there are changes to the configuration file that are necessary for specific databases. For example, the default configuration uses Jackrabbit's DbFileSystem class to mimic a file system in the database. While this works well in MySQL, it does not work for all databases. For example, if you are using an Oracle database, you will need to modify this to use OracleFileSystem. Please see the Jackrabbit documentation at http://jackrabbit.apache.org for further information.

You will also likely need to modify the JDBC database URLs so that they point your database. Don't forget to create the database first, and grant the user ID you are specifying in the configuration file access to create, modify, and drop tables.

Once you have configured Jackrabbit to store its repository in a database, the next time you bring up Liferay, the necessary database tables will be created automatically. Jackrabbit, however, does not create indexes on these tables, and so over time this can be a performance penalty. To fix this, you will need to manually go into your database and index the primary key columns for all of the Jackrabbit tables.

All of your Liferay nodes should be configured to use the same Jackrabbit repository in the database. Once that is working, you can create a Jackrabbit cluster (please see the section below).

Other Storage Options

There are other options available to configure Liferay's Document Library. The default option has the best performance with large document libraries, because it simply uses the file system. If you require a JSR-170 compliant document store, you can use Jackrabbit, which can be configured to use the file system or database, depending on your needs. If, however, you have very specific configuration needs, or if you already have a content repository that you want to continue using, you might want to use one of the following options Liferay has available.

Advanced File System Hook

To use the Advanced File System Hook, set:

dl.hook.impl=com.liferay.documentlibrary.util.AdvancedFileSystemHook

This is the preferred hook for clustered environments especially if you're using a SAN to store files.

From a performance standpoint, this method is superior to using Jackrabbit. The Advanced File System Hook distributes the files into multiple directories and thus circumvents file system limitations. Liferay does not implement any file level locking, so only use this if you're using a SAN that supports file locking (most modern ones do but check your SAN documentation to be sure).

The path for storing your documents is set also set using the dl.hook.file.system.root.dir property in the portal-ext.properties.

Amazon Simple Storage

To use Amazon's Simple Storage Service to store you documents for a Liferay Portal, set

dl.hook.impl=com.liferay.documentlibrary.util.S3Hook

in portal-ext.properties. You will need to consult the Amazon Simple Storage documentation for additional details on setting it up.

Documentum

You can use this hook to store your documents with Documentum. Before configuring Documentum, you will need to install the documentum-hook plugin. Before installing, however, please note that the documentum-hook plugin is only supported on Liferay Enterprise Edition, and is currently in an experimental stage, and may not be ready for production use. To use this hook, set

dl.hook.impl=liferay.documentum.hook.DocumentumHook

If you are using Documentum, there are additional settings that must be configured in the ${liferay_home}/documentum-hook/docroot/WEB-INF/src/dfc.properties and documentum-hook/docroot/WEB-INF/src/portlet.properties files.

Search Configuration

You can configure search in one of two ways: use pluggable enterprise search (recommended for a cluster configuration) or configure Lucene in such a way that either the index is stored on each node's file system or is shared in a database.

Pluggable Enterprise Search

As an alternative to using Lucene, Liferay supports pluggable search engines. The first implementation of this uses the open source search engine Solr, but in the future there will be many such plugins for your search engine of choice. This allows you to use a completely separate product for search, which can be installed on another application server in your environment. Your search engine then operates completely independently of your Liferay Portal nodes in a clustered environment, acting as a search service for all of the nodes simultaneously.

This solves the problem described below with sharing Lucene indexes. You can now have one search index for all of the nodes of your cluster without having to worry about putting it in a database (if you wish, you can still do this if you configure Solr or another search engine that way) or maintaining separate search indexes on all of your nodes. Each Liferay node will send requests to the search engine to update the search index when needed, and these updates are then queued and handled automatically by the search engine, independently.

Configuring the Solr Search Server

Since Solr is a standalone search engine, you will need to download it and install it first according to the instructions on the Solr web site (http://lucene.apache.org/solr). Of course, it is best to use a server that is separate from your Liferay installation, as your Solr server will be responsible for all indexing and searching for your entire cluster. Solr is distributed as a .war file with several .jar files which need to be available on your application server's class path. Once you have Solr up and running, integrating it with Liferay is easy, but it will require a restart of your application server.

The first thing you will need to define is the location of your search index. Assuming you are running a Linux server and you have mounted a file system for the index at /solr, create an environment variable that points to this folder. This environment variable needs to be called $SOLR_HOME. So for our example, we would define:

$SOLR_HOME=/solr

This environment variable can be defined anywhere you need: in your operating system's start up sequence, in the environment for the user who is logged in, or in the start up script for your application server. If you are going to use Tomcat to host Solr, you would modify setenv.sh or setenv.bat and add the environment variable there.

Once you have created the environment variable, you then can use it in your application server's start up configuration as a parameter to your JVM. This is configured differently per application server, but again, if you are using Tomcat, you would edit catalina.sh or catalina.bat and append the following to the $JAVA_OPTS variable:

-Dsolr.solr.home=$SOLR_HOME

This takes care of telling Solr where to store its search index. Go ahead and install Solr to this box according to the instructions on the Solr web site (http://lucene.apache.org/solr). Once it's installed, shut it down, as there is some more configuration to do.

Installing the Solr Liferay Plugin

Next, you have a choice. If you have installed Solr on the same system upon which Liferay is running, you can simply go to the Control Panel and install the solr-web plugin. This, however, defeats much of the purpose of using Solr, because the goal is to offload search indexing to another box in order to free up processing for your installation of Liferay. For this reason, you should not run Liferay and your search engine on the same box. Unfortunately, the configuration in the plugin defaults to having Solr and Liferay running on the same box, so to run them separately, you will have to make a change to a configuration file in the plugin before you install it so you can tell Liferay where to send indexing requests. In this case, go to the Liferay web site (http://www.liferay.com) and download the plugin manually.

Open or extract the plugin. Inside the plugin, you will find a file called solr-spring.xml in the WEB-INF/classes/META-INF folder. Open this file in a text editor and you will see the entry which defines where the Solr server can be found by Liferay:

Modify this value so that they point to the server upon which you are running Solr. Then save the file and put it back into the plugin archive in the same place it was before.

Next, extract the file schema.xml from the plugin. It should be in the docroot/WEB-INF/conf folder. This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. Copy this file to $SOLR_HOME/conf (you may have to create the conf directory) on your Solr box. Now you can go ahead and start Solr.

You can now hot deploy the solr-web plugin to all of your nodes. See the next section for instructions on hot deploying to a cluster.

Once the plugin is hot deployed, your Liferay search is automatically upgraded to use Solr. It is likely, however, that initial searches will come up with nothing: this is because you will need to reindex everything using Solr.

Go to the Control Panel. In the Server section, click Server Administration. Click the Execute button next to Reindex all search indexes at the bottom of the page. It may take a while, but Liferay will begin sending indexing requests to Solr for execution. When the process is complete, Solr will have a complete search index of your site, and will be running independently of all of your Liferay nodes.

Installing the plugin to your nodes has the effect of overriding any calls to Lucene for searching. All of Liferay's search boxes will now use Solr as the search index. This is ideal for a clustered environment, as it allows all of your nodes to share one search server and one search index, and this search server operates independently of all of your nodes.

Lucene Configuration

Lucene, the search indexer which Liferay uses, can be in a shared configuration for a clustered environment, or an index can be created on each node of the cluster. The easiest configuration to implement is to have an index on each node of the cluster. Liferay provides a method called ClusterLink which can send indexing requests to all nodes in the cluster to keep them in sync. This configuration does not require any additional hardware, and it performs very well. It may increase network traffic when an individual server reboots, since in that case a full reindex will be needed. But since this should only rarely happen, it's a good tradeoff if you don't have the extra hardware to implement a Solr search server.

You can enable ClusterLink by setting one property in your portal-ext.properties file:

cluster.link.enabled=true

Of course, this needs to be set on all the nodes.

If you wish to have a shared index, you will need to either share the index on the file system or in the database. This requires changing your Lucene configuration.

The Lucene configuration can be changed by modifying values in your portal-ext.properties file. Open your portal.properties file and search for the text Lucene. Copy that section and then paste it into your portal-ext.properties file.

If you wish to store the Lucene search index on a file system that is shared by all of the Liferay nodes, you can modify the location of the search index by changing the lucene.dir property. By default, this property points to the lucene folder inside the Liferay home folder:

lucene.dir=${liferay.home}/data/lucene/

Change this to the folder of your choice. To make the change take effect, you will need to restart Liferay. You can point all of the nodes to this folder, and they will use the same index.

Like Jackrabbit, however, this is not the best way to share the search index, as it could result in file corruption if different nodes try reindexing at the same time. We do not recommend this for a production system. A better way is to share the index is via a database, where the database can enforce data integrity on the index. This is very easy to do; it is a simple change to your portal-ext.properties file.

There is a single property called lucene.store.type. By default this is set to go to the file system. You can change this so that the index is stored in the database by making it the following:

lucene.store.type=jdbc

The next time Liferay is started, new tables will be created in the Liferay database, and the index will be stored there. If all the Liferay nodes point to the same database tables, they will be able to share the index. Performance on this is not always as good as it could be. Your DBAs may be able to tweak the database indexes a bit to improve performance. For better performance, you should consider using a separate search server (see the section on Solr above).

Note: MySQL users need to modify their JDBC connection string for this to work. Add the following parameter to your connection string:

emulateLocators=true

Alternatively, you can leave the configuration alone, and each node will then have its own index. This ensures that there are no collisions when multiple nodes update the index, because they all will have separate indexes. This, however, creates duplicate indexes and may not be the best use of resources. Again, for a better configuration, you should consider using a separate search server (see the section on Solr above).

Hot Deploy

Plugins which are hot deployed will need to be deployed separately to all of the Liferay nodes. Each node should, therefore, have its own hot deploy folder. This folder needs to be writable by the user under which Liferay is running, because plugins are moved from this folder to a temporary folder when they are deployed. This is to prevent the system from entering an endless loop, because the presence of a plugin in the folder is what triggers the hot deploy process.

When you want to deploy a plugin, copy that plugin to the hot deploy folders of all of the Liferay nodes. Depending on the number of nodes, it may be best to create a script to do this. Once the plugin has been deployed to all of the nodes, you can then make use of it (by adding the portlet to a page or choosing the theme as the look and feel for a page or page hierarchy).

Some containers contain a facility which allows the end user to deploy an application to one node, after which it will get copied to all of the other nodes. If you have configured your application server to support this, you won't need to hot deploy a plugin to all of the nodes—your application server will handle it transparently. Make sure, however, that you use Liferay's hot deploy mechanism to deploy plugins, as in many cases Liferay slightly modifies plugin .war files when hot deploying them.

All of the above will get basic Liferay clustering working; however, the configuration can be further optimized. We will see how to do this next.